8.6 KiB
Smart Routing & Auto Fallback
9Router automatically routes your requests through the best available provider using a 3-tier fallback system. Never stop coding due to quota limits or rate limiting.
How It Works
9Router uses intelligent routing to maximize your existing subscriptions, minimize costs, and ensure 24/7 availability:
Request → 9Router → Check Tier 1 (Subscription)
↓ quota exhausted
Check Tier 2 (Cheap)
↓ budget limit
Check Tier 3 (Free)
↓
Response
3-Tier Fallback System
Tier 1: SUBSCRIPTION (Primary)
- Claude Code (Pro/Max)
- OpenAI Codex (Plus/Pro)
- Gemini CLI (FREE 180K/month)
- GitHub Copilot
- Antigravity (Google)
Goal: Maximize value from subscriptions you already pay for.
Tier 2: CHEAP (Backup)
- GLM-4.7 ($0.60/1M input)
- MiniMax M2.1 ($0.20/1M input)
- Kimi K2 ($9/month flat)
Goal: Ultra-cheap backup when subscription quota runs out (~90% cheaper than ChatGPT API).
Tier 3: FREE (Emergency)
- iFlow (8 models)
- Qwen (3 models)
- Kiro (Claude FREE)
Goal: Zero-cost fallback for unlimited coding.
Automatic Switching
9Router monitors quota in real-time and switches providers automatically:
Scenario 1: Subscription Quota Exhausted
User request → cc/claude-opus-4-5
↓ quota exhausted (5-hour limit reached)
Auto switch → glm/glm-4.7
↓ daily quota exhausted
Auto switch → minimax/MiniMax-M2.1
↓ 5-hour quota exhausted
Auto switch → if/kimi-k2-thinking (FREE)
↓
Response delivered ✅
Result: Zero downtime, seamless experience.
Scenario 2: Rate Limiting
User request → cx/gpt-5.2-codex
↓ rate limited (too many requests)
Auto switch → glm/glm-4.7
↓
Response delivered ✅
Scenario 3: Provider Unavailable
User request → cc/claude-opus-4-5
↓ provider error (503)
Auto switch → next available model
↓
Response delivered ✅
Model Selection Logic
9Router selects the best model based on:
- Quota availability - Check if provider has remaining quota
- Cost tier - Prefer subscription → cheap → free
- Reset timing - Consider when quota resets
- Provider health - Skip providers with errors
Priority Order Example
For a request to cc/claude-opus-4-5:
1. Check Claude Code quota
✅ Available → Use cc/claude-opus-4-5
❌ Exhausted → Continue to step 2
2. Check fallback tier (if configured)
✅ GLM quota available → Use glm/glm-4.7
❌ Exhausted → Continue to step 3
3. Check free tier
✅ iFlow available → Use if/kimi-k2-thinking
❌ All exhausted → Return quota error
Configuration Options
Dashboard Settings
1. Enable/Disable Auto Fallback
Dashboard → Settings → Smart Routing
→ Toggle "Auto Fallback" ON/OFF
- ON (default): Automatic tier switching
- OFF: Strict mode, return error if primary model unavailable
2. Set Budget Limits
Dashboard → Settings → Budget Control
→ Daily limit: $5
→ Monthly limit: $50
When budget reached, 9Router automatically switches to free tier.
3. Configure Fallback Order
Dashboard → Settings → Fallback Priority
→ Drag to reorder providers within each tier
Example custom order:
Tier 1: Gemini CLI → Claude Code → Codex
Tier 2: MiniMax → GLM → Kimi
Tier 3: iFlow → Kiro → Qwen
4. Quota Reset Notifications
Dashboard → Settings → Notifications
→ Email when quota resets
→ Alert when 80% quota used
Examples
Example 1: Basic Auto Fallback
Setup:
Model: cc/claude-opus-4-5-20251101
Fallback: Auto (default 3-tier)
Behavior:
Morning (fresh quota):
Request → cc/claude-opus-4-5 ✅
Afternoon (quota exhausted):
Request → glm/glm-4.7 ✅ (auto switched)
Evening (GLM quota out):
Request → minimax/MiniMax-M2.1 ✅ (auto switched)
Late night (all paid quota out):
Request → if/kimi-k2-thinking ✅ (free tier)
Cost: ~$5-10/month extra (mostly covered by subscription).
Example 2: Budget-Conscious Routing
Setup:
Dashboard → Settings:
Daily budget: $2
Monthly budget: $20
Fallback: Enabled
Behavior:
Day 1-15 (within budget):
Requests → glm/glm-4.7 (cheap tier)
Cost: $1.50/day
Day 16 (budget reached):
Requests → if/kimi-k2-thinking (free tier)
Cost: $0
Next month (budget resets):
Requests → glm/glm-4.7 again
Result: Never exceed $20/month, always available.
Example 3: Subscription-Only Mode
Setup:
Dashboard → Settings:
Auto Fallback: OFF
Strict mode: ON
Behavior:
Request → cc/claude-opus-4-5
✅ Quota available → Success
❌ Quota exhausted → Return error (no fallback)
Use case: When you only want to use paid subscriptions, no extra costs.
Example 4: Free-Only Mode
Setup:
Model: if/kimi-k2-thinking
Fallback: qw/qwen3-coder-plus → kr/claude-sonnet-4.5
Behavior:
All requests → Free tier only
Cost: $0 forever
Use case: Personal projects, learning, experimentation.
Best Practices
1. Maximize Subscription Value
Strategy:
- Set subscription models as Tier 1
- Monitor quota usage in dashboard
- Use cheap tier only when subscription exhausted
Example combo:
cc/claude-opus-4-5 → glm/glm-4.7 → if/kimi-k2-thinking
2. Optimize for Cost
Strategy:
- Use Gemini CLI free tier first (180K/month)
- Fallback to GLM/MiniMax (ultra-cheap)
- Emergency: iFlow (free)
Example combo:
gc/gemini-3-flash-preview → glm/glm-4.7 → if/kimi-k2-thinking
3. Optimize for Quality
Strategy:
- Use best models (Claude Opus, GPT-5.2)
- Fallback to good cheap models (GLM-4.7)
- Last resort: Free tier
Example combo:
cc/claude-opus-4-5 → cx/gpt-5.2-codex → glm/glm-4.7
4. 24/7 Availability
Strategy:
- Always include free tier in fallback
- Monitor quota reset times
- Distribute usage across providers
Example combo:
cc/claude-opus-4-5 → glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
Result: Never run out of quota, code anytime.
Quota Reset Strategy
Plan your usage around quota reset times:
| Provider | Quota Reset | Strategy |
|---|---|---|
| Claude Code | 5-hour + weekly | Use in morning, fresh quota |
| Codex | 5-hour + weekly | Use after Claude quota out |
| Gemini CLI | Daily (1K) + Monthly (180K) | Use throughout day |
| GLM-4.7 | Daily 10:00 AM | Use evening, resets next morning |
| MiniMax M2.1 | 5-hour rolling | Use anytime, tracks rolling window |
| iFlow/Qwen/Kiro | No limit | Emergency backup |
Daily routine example:
08:00 - 13:00: Claude Code (fresh 5h quota)
13:00 - 18:00: Gemini CLI (1K/day quota)
18:00 - 22:00: GLM-4.7 (cheap, resets 10AM)
22:00 - 08:00: MiniMax or iFlow (5h rolling or free)
Monitoring & Alerts
Dashboard Quota Tracker
Dashboard → Quota Overview:
Claude Code: 2.5h / 5h remaining (50%)
Gemini CLI: 450 / 1000 requests today
GLM-4.7: 5M / 10M tokens (resets in 8h)
MiniMax: 3M / 5M tokens (rolling 5h)
Real-Time Notifications
Dashboard → Notifications:
⚠️ Claude Code quota 80% used (1h remaining)
✅ GLM-4.7 quota reset (10M tokens available)
💰 Daily budget 50% used ($2.50 / $5)
Usage Analytics
Dashboard → Analytics:
Today: 50M tokens
- 30M via Claude Code (subscription)
- 15M via GLM-4.7 ($9)
- 5M via iFlow (free)
Cost: $9 (vs $1000 on ChatGPT API)
Savings: 99%
Troubleshooting
Issue: "All providers quota exhausted"
Solution:
- Check dashboard quota tracker
- Wait for quota reset (see countdown)
- Add free tier to fallback chain
- Or increase budget limit
Issue: "Too many fallback switches"
Solution:
- Check if primary provider is down
- Increase quota limits (upgrade subscription)
- Use cheaper primary model (GLM instead of Claude)
Issue: "Unexpected costs"
Solution:
- Dashboard → Analytics → Review usage
- Set daily/monthly budget limits
- Switch to free tier for non-critical tasks
- Use combos with free fallback
Related
- Combos - Create custom fallback chains
- Quota Tracking - Monitor usage and costs