537 lines
10 KiB
Markdown
537 lines
10 KiB
Markdown
# Combos - Custom Fallback Chains
|
|
|
|
Create custom model combinations with automatic fallback. Combos let you define your own routing strategy based on cost, quality, and availability.
|
|
|
|
---
|
|
|
|
## What Are Combos?
|
|
|
|
Combos are **custom fallback chains** that you create in the dashboard. Instead of using a single model, you define a sequence of models that 9Router tries in order.
|
|
|
|
**Example:**
|
|
```
|
|
Combo name: premium-coding
|
|
Models:
|
|
1. cc/claude-opus-4-5-20251101 (try first)
|
|
2. glm/glm-4.7 (if #1 quota exhausted)
|
|
3. minimax/MiniMax-M2.1 (if #2 quota exhausted)
|
|
```
|
|
|
|
**Usage in CLI:**
|
|
```
|
|
Model: premium-coding
|
|
```
|
|
|
|
9Router automatically tries each model in sequence until one succeeds.
|
|
|
|
---
|
|
|
|
## Why Use Combos?
|
|
|
|
### 1. Maximize Subscription Value
|
|
```
|
|
cc/claude-opus → glm/glm-4.7 → if/kimi-k2-thinking
|
|
|
|
→ Use subscription first, cheap backup, free emergency
|
|
→ Get full value from subscriptions you already pay for
|
|
```
|
|
|
|
### 2. Minimize Costs
|
|
```
|
|
glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
|
|
|
|
→ Start with cheapest paid option ($0.60/1M)
|
|
→ Fallback to even cheaper ($0.20/1M)
|
|
→ Emergency free tier
|
|
→ Total cost: ~$5-10/month vs $2000 on ChatGPT API
|
|
```
|
|
|
|
### 3. Ensure 24/7 Availability
|
|
```
|
|
cc/claude-opus → cx/gpt-5.2-codex → glm/glm-4.7 → if/kimi-k2-thinking
|
|
|
|
→ Always include free tier at the end
|
|
→ Never run out of quota
|
|
→ Code anytime, anywhere
|
|
```
|
|
|
|
### 4. Optimize for Quality
|
|
```
|
|
cc/claude-opus-4-5 → cx/gpt-5.2-codex → gc/gemini-3-pro
|
|
|
|
→ Best models first
|
|
→ Fallback to other premium models
|
|
→ Maintain high quality across fallback chain
|
|
```
|
|
|
|
---
|
|
|
|
## How to Create Combos
|
|
|
|
### Step 1: Open Dashboard
|
|
|
|
```
|
|
http://localhost:20128
|
|
→ Login with your password
|
|
```
|
|
|
|
### Step 2: Navigate to Combos
|
|
|
|
```
|
|
Dashboard → Combos → Create New Combo
|
|
```
|
|
|
|
### Step 3: Configure Combo
|
|
|
|
**Combo Name:**
|
|
```
|
|
premium-coding
|
|
```
|
|
|
|
**Description (optional):**
|
|
```
|
|
Subscription first, cheap backup, free emergency
|
|
```
|
|
|
|
**Select Models:**
|
|
```
|
|
1. cc/claude-opus-4-5-20251101
|
|
2. glm/glm-4.7
|
|
3. minimax/MiniMax-M2.1
|
|
```
|
|
|
|
**Drag to reorder** - Priority from top to bottom.
|
|
|
|
### Step 4: Save
|
|
|
|
```
|
|
Click "Save Combo"
|
|
→ Combo appears in model list
|
|
```
|
|
|
|
### Step 5: Use in CLI
|
|
|
|
```
|
|
Cursor/Cline/Any tool:
|
|
Model: premium-coding
|
|
```
|
|
|
|
---
|
|
|
|
## Example Combos
|
|
|
|
### Example 1: Premium Coding (Subscription → Cheap → Free)
|
|
|
|
**Goal**: Maximize subscription value, minimize extra costs.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: premium-coding
|
|
Models:
|
|
1. cc/claude-opus-4-5-20251101
|
|
2. glm/glm-4.7
|
|
3. minimax/MiniMax-M2.1
|
|
```
|
|
|
|
**Usage:**
|
|
```
|
|
Cursor IDE:
|
|
Model: premium-coding
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Morning (fresh quota):
|
|
Request → cc/claude-opus-4-5 ✅
|
|
|
|
Afternoon (Claude quota out):
|
|
Request → glm/glm-4.7 ✅ (auto switched)
|
|
|
|
Evening (GLM quota out):
|
|
Request → minimax/MiniMax-M2.1 ✅ (auto switched)
|
|
```
|
|
|
|
**Monthly cost (100M tokens):**
|
|
```
|
|
80M via Claude Code: $0 (subscription)
|
|
15M via GLM: $9
|
|
5M via MiniMax: $1
|
|
Total: $10 + your subscription
|
|
```
|
|
|
|
**Savings**: ~99% vs ChatGPT API ($2000).
|
|
|
|
---
|
|
|
|
### Example 2: Budget Combo (Cheap → Free)
|
|
|
|
**Goal**: Minimize costs, use free tier as backup.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: budget-combo
|
|
Models:
|
|
1. glm/glm-4.7
|
|
2. minimax/MiniMax-M2.1
|
|
3. if/kimi-k2-thinking
|
|
```
|
|
|
|
**Usage:**
|
|
```
|
|
Cline:
|
|
Provider: OpenAI Compatible
|
|
Base URL: http://localhost:20128/v1
|
|
Model: budget-combo
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Request → glm/glm-4.7
|
|
✅ Daily quota available → Use GLM ($0.60/1M)
|
|
❌ Quota exhausted → Try MiniMax ($0.20/1M)
|
|
❌ MiniMax quota out → Use iFlow (FREE)
|
|
```
|
|
|
|
**Monthly cost (100M tokens):**
|
|
```
|
|
70M via GLM: $42
|
|
20M via MiniMax: $4
|
|
10M via iFlow: $0
|
|
Total: $46 vs $2000 on ChatGPT API
|
|
```
|
|
|
|
**Savings**: 97%.
|
|
|
|
---
|
|
|
|
### Example 3: Free Combo (Zero Cost)
|
|
|
|
**Goal**: 100% free, no costs ever.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: free-combo
|
|
Models:
|
|
1. if/kimi-k2-thinking
|
|
2. qw/qwen3-coder-plus
|
|
3. kr/claude-sonnet-4.5
|
|
```
|
|
|
|
**Usage:**
|
|
```
|
|
Claude Desktop:
|
|
Model: free-combo
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Request → if/kimi-k2-thinking
|
|
✅ Available → Use iFlow
|
|
❌ Error → Try Qwen
|
|
❌ Error → Try Kiro
|
|
```
|
|
|
|
**Monthly cost:**
|
|
```
|
|
100M tokens via free providers: $0
|
|
Total: $0 forever
|
|
```
|
|
|
|
**Use case**: Personal projects, learning, experimentation.
|
|
|
|
---
|
|
|
|
### Example 4: Quality First (Premium Models Only)
|
|
|
|
**Goal**: Best quality, no cheap fallback.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: quality-first
|
|
Models:
|
|
1. cc/claude-opus-4-5-20251101
|
|
2. cx/gpt-5.2-codex
|
|
3. gc/gemini-3-pro-preview
|
|
```
|
|
|
|
**Usage:**
|
|
```
|
|
Codex CLI:
|
|
export OPENAI_BASE_URL="http://localhost:20128"
|
|
Model: quality-first
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Request → cc/claude-opus-4-5
|
|
❌ Quota out → cx/gpt-5.2-codex
|
|
❌ Quota out → gc/gemini-3-pro-preview
|
|
❌ All out → Return error (no cheap fallback)
|
|
```
|
|
|
|
**Use case**: Critical production code, complex refactoring.
|
|
|
|
---
|
|
|
|
### Example 5: Multi-Subscription (Maximize All)
|
|
|
|
**Goal**: Use all subscriptions before paying extra.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: multi-sub
|
|
Models:
|
|
1. gc/gemini-3-flash-preview (FREE 180K/month)
|
|
2. cc/claude-opus-4-5-20251101 (Pro subscription)
|
|
3. cx/gpt-5.2-codex (Plus subscription)
|
|
4. gh/gpt-5 (Copilot subscription)
|
|
5. glm/glm-4.7 (Cheap backup)
|
|
6. if/kimi-k2-thinking (Free emergency)
|
|
```
|
|
|
|
**Monthly cost (200M tokens):**
|
|
```
|
|
50M via Gemini CLI: $0 (free tier)
|
|
80M via Claude Code: $0 (subscription)
|
|
40M via Codex: $0 (subscription)
|
|
20M via Copilot: $0 (subscription)
|
|
8M via GLM: $4.80
|
|
2M via iFlow: $0
|
|
Total: $4.80 + existing subscriptions
|
|
```
|
|
|
|
**Result**: Use 190M tokens from subscriptions, only $4.80 extra.
|
|
|
|
---
|
|
|
|
### Example 6: Quota Reset Optimization
|
|
|
|
**Goal**: Distribute usage based on reset times.
|
|
|
|
```
|
|
Dashboard → Combos → Create New
|
|
|
|
Name: reset-optimized
|
|
Models:
|
|
1. cc/claude-opus-4-5 (5h reset, use morning)
|
|
2. gc/gemini-3-flash (1K/day, use afternoon)
|
|
3. glm/glm-4.7 (daily 10AM reset, use evening)
|
|
4. minimax/MiniMax-M2.1 (5h rolling, use night)
|
|
5. if/kimi-k2-thinking (unlimited, emergency)
|
|
```
|
|
|
|
**Daily routine:**
|
|
```
|
|
08:00 - 13:00: Claude Code (fresh 5h quota)
|
|
13:00 - 18:00: Gemini CLI (1K/day quota)
|
|
18:00 - 22:00: GLM (resets 10AM next day)
|
|
22:00 - 08:00: MiniMax (5h rolling) or iFlow
|
|
```
|
|
|
|
**Result**: Code 24/7 with minimal costs.
|
|
|
|
---
|
|
|
|
## Use Combos in CLI Tools
|
|
|
|
### Cursor IDE
|
|
|
|
```
|
|
Settings → Models → Advanced:
|
|
OpenAI API Base URL: http://localhost:20128/v1
|
|
OpenAI API Key: [from dashboard]
|
|
Model: premium-coding
|
|
```
|
|
|
|
### Claude Desktop
|
|
|
|
Edit `~/.claude/config.json`:
|
|
```json
|
|
{
|
|
"anthropic_api_base": "http://localhost:20128/v1",
|
|
"anthropic_api_key": "your-9router-api-key",
|
|
"model": "budget-combo"
|
|
}
|
|
```
|
|
|
|
### Codex CLI
|
|
|
|
```bash
|
|
export OPENAI_BASE_URL="http://localhost:20128"
|
|
export OPENAI_API_KEY="your-9router-api-key"
|
|
|
|
codex --model quality-first "your prompt"
|
|
```
|
|
|
|
### Cline / Continue / RooCode
|
|
|
|
```
|
|
Provider: OpenAI Compatible
|
|
Base URL: http://localhost:20128/v1
|
|
API Key: [from dashboard]
|
|
Model: free-combo
|
|
```
|
|
|
|
### API Request
|
|
|
|
```bash
|
|
curl http://localhost:20128/v1/chat/completions \
|
|
-H "Authorization: Bearer your-api-key" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "premium-coding",
|
|
"messages": [
|
|
{"role": "user", "content": "Write a function to..."}
|
|
],
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### 1. Always Include Free Tier
|
|
|
|
```
|
|
✅ Good:
|
|
cc/claude-opus → glm/glm-4.7 → if/kimi-k2-thinking
|
|
|
|
❌ Bad:
|
|
cc/claude-opus → glm/glm-4.7
|
|
(no free fallback, can run out of quota)
|
|
```
|
|
|
|
**Why**: Ensures 24/7 availability, never blocked by quota.
|
|
|
|
### 2. Order by Cost (Cheap to Expensive)
|
|
|
|
```
|
|
✅ Good:
|
|
glm/glm-4.7 → minimax/MiniMax-M2.1 → cc/claude-opus
|
|
|
|
❌ Bad:
|
|
cc/claude-opus → glm/glm-4.7
|
|
(wastes subscription quota on simple tasks)
|
|
```
|
|
|
|
**Exception**: If you want to maximize subscription value, put subscription first.
|
|
|
|
### 3. Match Quality Requirements
|
|
|
|
```
|
|
For production code:
|
|
cc/claude-opus → cx/gpt-5.2-codex → glm/glm-4.7
|
|
|
|
For quick tasks:
|
|
glm/glm-4.7 → if/kimi-k2-thinking
|
|
|
|
For experimentation:
|
|
if/kimi-k2-thinking → qw/qwen3-coder-plus
|
|
```
|
|
|
|
### 4. Consider Quota Reset Times
|
|
|
|
```
|
|
Morning combo (fresh quotas):
|
|
cc/claude-opus → cx/gpt-5.2-codex
|
|
|
|
Evening combo (quotas likely exhausted):
|
|
glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
|
|
```
|
|
|
|
### 5. Create Multiple Combos for Different Use Cases
|
|
|
|
```
|
|
premium-coding: For complex tasks
|
|
budget-combo: For simple tasks
|
|
free-combo: For experimentation
|
|
quality-first: For production code
|
|
```
|
|
|
|
**Switch between combos** based on task requirements.
|
|
|
|
### 6. Monitor Combo Performance
|
|
|
|
```
|
|
Dashboard → Analytics → Combo Usage:
|
|
premium-coding:
|
|
80% via cc/claude-opus (good, using subscription)
|
|
15% via glm/glm-4.7 (acceptable backup)
|
|
5% via minimax (rare fallback)
|
|
```
|
|
|
|
**Optimize**: If too much fallback usage, increase primary quota or reorder models.
|
|
|
|
---
|
|
|
|
## Advanced Configuration
|
|
|
|
### Set Budget Limits per Combo
|
|
|
|
```
|
|
Dashboard → Combos → Edit → Budget:
|
|
Daily limit: $5
|
|
Monthly limit: $50
|
|
```
|
|
|
|
When limit reached, 9Router skips paid models and uses free tier only.
|
|
|
|
### Enable/Disable Models in Combo
|
|
|
|
```
|
|
Dashboard → Combos → Edit → Models:
|
|
✅ cc/claude-opus-4-5 (enabled)
|
|
❌ glm/glm-4.7 (temporarily disabled)
|
|
✅ if/kimi-k2-thinking (enabled)
|
|
```
|
|
|
|
**Use case**: Temporarily disable expensive models without deleting combo.
|
|
|
|
### Clone Existing Combo
|
|
|
|
```
|
|
Dashboard → Combos → Clone "premium-coding"
|
|
→ Creates copy with "-copy" suffix
|
|
→ Modify and save as new combo
|
|
```
|
|
|
|
**Use case**: Create variations for different scenarios.
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
**Issue: Combo not appearing in model list**
|
|
|
|
**Solution:**
|
|
1. Refresh dashboard
|
|
2. Check combo is saved (green checkmark)
|
|
3. Restart CLI tool to refresh model list
|
|
|
|
**Issue: Combo always uses last model (free tier)**
|
|
|
|
**Solution:**
|
|
1. Check quota for primary models (Dashboard → Quota)
|
|
2. Verify API keys are valid (Dashboard → Providers)
|
|
3. Check budget limits not exceeded
|
|
|
|
**Issue: Combo costs more than expected**
|
|
|
|
**Solution:**
|
|
1. Dashboard → Analytics → Review combo usage
|
|
2. Check if primary models are quota-exhausted
|
|
3. Reorder models (put cheaper first)
|
|
4. Set budget limits
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- [Smart Routing](./smart-routing.md) - How auto fallback works
|
|
- [Quota Tracking](./quota-tracking.md) - Monitor usage and costs
|