407 lines
8.6 KiB
Markdown
407 lines
8.6 KiB
Markdown
# Smart Routing & Auto Fallback
|
|
|
|
9Router automatically routes your requests through the best available provider using a 3-tier fallback system. Never stop coding due to quota limits or rate limiting.
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
9Router uses intelligent routing to maximize your existing subscriptions, minimize costs, and ensure 24/7 availability:
|
|
|
|
```
|
|
Request → 9Router → Check Tier 1 (Subscription)
|
|
↓ quota exhausted
|
|
Check Tier 2 (Cheap)
|
|
↓ budget limit
|
|
Check Tier 3 (Free)
|
|
↓
|
|
Response
|
|
```
|
|
|
|
### 3-Tier Fallback System
|
|
|
|
**Tier 1: SUBSCRIPTION (Primary)**
|
|
- Claude Code (Pro/Max)
|
|
- OpenAI Codex (Plus/Pro)
|
|
- Gemini CLI (FREE 180K/month)
|
|
- GitHub Copilot
|
|
- Antigravity (Google)
|
|
|
|
**Goal**: Maximize value from subscriptions you already pay for.
|
|
|
|
**Tier 2: CHEAP (Backup)**
|
|
- GLM-4.7 ($0.60/1M input)
|
|
- MiniMax M2.1 ($0.20/1M input)
|
|
- Kimi K2 ($9/month flat)
|
|
|
|
**Goal**: Ultra-cheap backup when subscription quota runs out (~90% cheaper than ChatGPT API).
|
|
|
|
**Tier 3: FREE (Emergency)**
|
|
- iFlow (8 models)
|
|
- Qwen (3 models)
|
|
- Kiro (Claude FREE)
|
|
|
|
**Goal**: Zero-cost fallback for unlimited coding.
|
|
|
|
---
|
|
|
|
## Automatic Switching
|
|
|
|
9Router monitors quota in real-time and switches providers automatically:
|
|
|
|
### Scenario 1: Subscription Quota Exhausted
|
|
|
|
```
|
|
User request → cc/claude-opus-4-5
|
|
↓ quota exhausted (5-hour limit reached)
|
|
Auto switch → glm/glm-4.7
|
|
↓ daily quota exhausted
|
|
Auto switch → minimax/MiniMax-M2.1
|
|
↓ 5-hour quota exhausted
|
|
Auto switch → if/kimi-k2-thinking (FREE)
|
|
↓
|
|
Response delivered ✅
|
|
```
|
|
|
|
**Result**: Zero downtime, seamless experience.
|
|
|
|
### Scenario 2: Rate Limiting
|
|
|
|
```
|
|
User request → cx/gpt-5.2-codex
|
|
↓ rate limited (too many requests)
|
|
Auto switch → glm/glm-4.7
|
|
↓
|
|
Response delivered ✅
|
|
```
|
|
|
|
### Scenario 3: Provider Unavailable
|
|
|
|
```
|
|
User request → cc/claude-opus-4-5
|
|
↓ provider error (503)
|
|
Auto switch → next available model
|
|
↓
|
|
Response delivered ✅
|
|
```
|
|
|
|
---
|
|
|
|
## Model Selection Logic
|
|
|
|
9Router selects the best model based on:
|
|
|
|
1. **Quota availability** - Check if provider has remaining quota
|
|
2. **Cost tier** - Prefer subscription → cheap → free
|
|
3. **Reset timing** - Consider when quota resets
|
|
4. **Provider health** - Skip providers with errors
|
|
|
|
### Priority Order Example
|
|
|
|
For a request to `cc/claude-opus-4-5`:
|
|
|
|
```
|
|
1. Check Claude Code quota
|
|
✅ Available → Use cc/claude-opus-4-5
|
|
❌ Exhausted → Continue to step 2
|
|
|
|
2. Check fallback tier (if configured)
|
|
✅ GLM quota available → Use glm/glm-4.7
|
|
❌ Exhausted → Continue to step 3
|
|
|
|
3. Check free tier
|
|
✅ iFlow available → Use if/kimi-k2-thinking
|
|
❌ All exhausted → Return quota error
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Options
|
|
|
|
### Dashboard Settings
|
|
|
|
**1. Enable/Disable Auto Fallback**
|
|
|
|
```
|
|
Dashboard → Settings → Smart Routing
|
|
→ Toggle "Auto Fallback" ON/OFF
|
|
```
|
|
|
|
- **ON** (default): Automatic tier switching
|
|
- **OFF**: Strict mode, return error if primary model unavailable
|
|
|
|
**2. Set Budget Limits**
|
|
|
|
```
|
|
Dashboard → Settings → Budget Control
|
|
→ Daily limit: $5
|
|
→ Monthly limit: $50
|
|
```
|
|
|
|
When budget reached, 9Router automatically switches to free tier.
|
|
|
|
**3. Configure Fallback Order**
|
|
|
|
```
|
|
Dashboard → Settings → Fallback Priority
|
|
→ Drag to reorder providers within each tier
|
|
```
|
|
|
|
Example custom order:
|
|
```
|
|
Tier 1: Gemini CLI → Claude Code → Codex
|
|
Tier 2: MiniMax → GLM → Kimi
|
|
Tier 3: iFlow → Kiro → Qwen
|
|
```
|
|
|
|
**4. Quota Reset Notifications**
|
|
|
|
```
|
|
Dashboard → Settings → Notifications
|
|
→ Email when quota resets
|
|
→ Alert when 80% quota used
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Basic Auto Fallback
|
|
|
|
**Setup:**
|
|
```
|
|
Model: cc/claude-opus-4-5-20251101
|
|
Fallback: Auto (default 3-tier)
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Morning (fresh quota):
|
|
Request → cc/claude-opus-4-5 ✅
|
|
|
|
Afternoon (quota exhausted):
|
|
Request → glm/glm-4.7 ✅ (auto switched)
|
|
|
|
Evening (GLM quota out):
|
|
Request → minimax/MiniMax-M2.1 ✅ (auto switched)
|
|
|
|
Late night (all paid quota out):
|
|
Request → if/kimi-k2-thinking ✅ (free tier)
|
|
```
|
|
|
|
**Cost**: ~$5-10/month extra (mostly covered by subscription).
|
|
|
|
### Example 2: Budget-Conscious Routing
|
|
|
|
**Setup:**
|
|
```
|
|
Dashboard → Settings:
|
|
Daily budget: $2
|
|
Monthly budget: $20
|
|
Fallback: Enabled
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Day 1-15 (within budget):
|
|
Requests → glm/glm-4.7 (cheap tier)
|
|
Cost: $1.50/day
|
|
|
|
Day 16 (budget reached):
|
|
Requests → if/kimi-k2-thinking (free tier)
|
|
Cost: $0
|
|
|
|
Next month (budget resets):
|
|
Requests → glm/glm-4.7 again
|
|
```
|
|
|
|
**Result**: Never exceed $20/month, always available.
|
|
|
|
### Example 3: Subscription-Only Mode
|
|
|
|
**Setup:**
|
|
```
|
|
Dashboard → Settings:
|
|
Auto Fallback: OFF
|
|
Strict mode: ON
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
Request → cc/claude-opus-4-5
|
|
✅ Quota available → Success
|
|
❌ Quota exhausted → Return error (no fallback)
|
|
```
|
|
|
|
**Use case**: When you only want to use paid subscriptions, no extra costs.
|
|
|
|
### Example 4: Free-Only Mode
|
|
|
|
**Setup:**
|
|
```
|
|
Model: if/kimi-k2-thinking
|
|
Fallback: qw/qwen3-coder-plus → kr/claude-sonnet-4.5
|
|
```
|
|
|
|
**Behavior:**
|
|
```
|
|
All requests → Free tier only
|
|
Cost: $0 forever
|
|
```
|
|
|
|
**Use case**: Personal projects, learning, experimentation.
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### 1. Maximize Subscription Value
|
|
|
|
```
|
|
Strategy:
|
|
- Set subscription models as Tier 1
|
|
- Monitor quota usage in dashboard
|
|
- Use cheap tier only when subscription exhausted
|
|
```
|
|
|
|
**Example combo:**
|
|
```
|
|
cc/claude-opus-4-5 → glm/glm-4.7 → if/kimi-k2-thinking
|
|
```
|
|
|
|
### 2. Optimize for Cost
|
|
|
|
```
|
|
Strategy:
|
|
- Use Gemini CLI free tier first (180K/month)
|
|
- Fallback to GLM/MiniMax (ultra-cheap)
|
|
- Emergency: iFlow (free)
|
|
```
|
|
|
|
**Example combo:**
|
|
```
|
|
gc/gemini-3-flash-preview → glm/glm-4.7 → if/kimi-k2-thinking
|
|
```
|
|
|
|
### 3. Optimize for Quality
|
|
|
|
```
|
|
Strategy:
|
|
- Use best models (Claude Opus, GPT-5.2)
|
|
- Fallback to good cheap models (GLM-4.7)
|
|
- Last resort: Free tier
|
|
```
|
|
|
|
**Example combo:**
|
|
```
|
|
cc/claude-opus-4-5 → cx/gpt-5.2-codex → glm/glm-4.7
|
|
```
|
|
|
|
### 4. 24/7 Availability
|
|
|
|
```
|
|
Strategy:
|
|
- Always include free tier in fallback
|
|
- Monitor quota reset times
|
|
- Distribute usage across providers
|
|
```
|
|
|
|
**Example combo:**
|
|
```
|
|
cc/claude-opus-4-5 → glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
|
|
```
|
|
|
|
**Result**: Never run out of quota, code anytime.
|
|
|
|
---
|
|
|
|
## Quota Reset Strategy
|
|
|
|
Plan your usage around quota reset times:
|
|
|
|
| Provider | Quota Reset | Strategy |
|
|
|----------|-------------|----------|
|
|
| **Claude Code** | 5-hour + weekly | Use in morning, fresh quota |
|
|
| **Codex** | 5-hour + weekly | Use after Claude quota out |
|
|
| **Gemini CLI** | Daily (1K) + Monthly (180K) | Use throughout day |
|
|
| **GLM-4.7** | Daily 10:00 AM | Use evening, resets next morning |
|
|
| **MiniMax M2.1** | 5-hour rolling | Use anytime, tracks rolling window |
|
|
| **iFlow/Qwen/Kiro** | No limit | Emergency backup |
|
|
|
|
**Daily routine example:**
|
|
```
|
|
08:00 - 13:00: Claude Code (fresh 5h quota)
|
|
13:00 - 18:00: Gemini CLI (1K/day quota)
|
|
18:00 - 22:00: GLM-4.7 (cheap, resets 10AM)
|
|
22:00 - 08:00: MiniMax or iFlow (5h rolling or free)
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring & Alerts
|
|
|
|
### Dashboard Quota Tracker
|
|
|
|
```
|
|
Dashboard → Quota Overview:
|
|
Claude Code: 2.5h / 5h remaining (50%)
|
|
Gemini CLI: 450 / 1000 requests today
|
|
GLM-4.7: 5M / 10M tokens (resets in 8h)
|
|
MiniMax: 3M / 5M tokens (rolling 5h)
|
|
```
|
|
|
|
### Real-Time Notifications
|
|
|
|
```
|
|
Dashboard → Notifications:
|
|
⚠️ Claude Code quota 80% used (1h remaining)
|
|
✅ GLM-4.7 quota reset (10M tokens available)
|
|
💰 Daily budget 50% used ($2.50 / $5)
|
|
```
|
|
|
|
### Usage Analytics
|
|
|
|
```
|
|
Dashboard → Analytics:
|
|
Today: 50M tokens
|
|
- 30M via Claude Code (subscription)
|
|
- 15M via GLM-4.7 ($9)
|
|
- 5M via iFlow (free)
|
|
|
|
Cost: $9 (vs $1000 on ChatGPT API)
|
|
Savings: 99%
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
**Issue: "All providers quota exhausted"**
|
|
|
|
**Solution:**
|
|
1. Check dashboard quota tracker
|
|
2. Wait for quota reset (see countdown)
|
|
3. Add free tier to fallback chain
|
|
4. Or increase budget limit
|
|
|
|
**Issue: "Too many fallback switches"**
|
|
|
|
**Solution:**
|
|
1. Check if primary provider is down
|
|
2. Increase quota limits (upgrade subscription)
|
|
3. Use cheaper primary model (GLM instead of Claude)
|
|
|
|
**Issue: "Unexpected costs"**
|
|
|
|
**Solution:**
|
|
1. Dashboard → Analytics → Review usage
|
|
2. Set daily/monthly budget limits
|
|
3. Switch to free tier for non-critical tasks
|
|
4. Use combos with free fallback
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- [Combos](./combos.md) - Create custom fallback chains
|
|
- [Quota Tracking](./quota-tracking.md) - Monitor usage and costs
|