sseToJsonHandler.js unconditionally deleted reasoning_content from all
non-streaming responses (added for Firecrawl SDK compatibility). This
breaks thinking models (Qwen3.5, Claude extended thinking, etc.) where
the model may use all tokens for reasoning, leaving content empty.
When reasoning_content is stripped in that case, the response appears
completely empty to the client.
Fix: only strip reasoning_content when the response also has non-empty
content, so that reasoning output is preserved when it is the only
useful output.
Co-authored-by: Agent Zero <agent@agent-zero.local>
Cursor's API now rejects requests with outdated client versions,
returning [400]: Update Required for Composer 2. Bump
x-cursor-client-version from 2.3.41 to 3.1.0 across all three
locations where it is set.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add clientDetector utility to identify CLI tools (Claude Code, Gemini CLI,
Antigravity, Codex) from request headers. When the CLI tool and provider
are a native pair, skip all translation — only swap model and Bearer token.
Made-with: Cursor
Replace empty reasoning_content with explicit </think> closing tag when exiting thinking block to properly signal end of reasoning section in streaming responses.
- Encode thoughtSignature into tool_call.id using _TSIG_ delimiter and base64url
- Decode _TSIG_ on request to restore thoughtSignature for Gemini multi-turn thinking
- Track pendingThoughtSignature across parts for deferred signature attachment
- Add LocalMutex (2-layer locking) to prevent ELOCKED on concurrent DB access
- Increase lockfile retries from 5 to 15 for multi-process robustness
- Restore db.json seed on first run to prevent ENOENT on lockfile.lock
- Use process.env.BASE_URL fallback in models test route
- Remove gemini-3-flash-lite-preview from provider models
Co-authored-by: kwanLeeFrmVi <quanle96@outlook.com>
Closes#450
Made-with: Cursor
- Add claudeHeaderCache.js to intercept and cache live Claude Code client headers
- Forward cached headers dynamically to api.anthropic.com via default.js
- Strip first-party identity headers (x-app, claude-code-* beta) for non-Anthropic upstreams
- Validate and sanitize tool call IDs to match Anthropic pattern (^[a-zA-Z0-9_-]+$)
- Skip thinking blocks when applying cache_control; fix max_tokens buffer (+1024)
- Strip cache_control from thinking blocks in openai-to-claude translator
- Comment out thoughtSignature in Gemini translator (kept for reference)
- Expand .gitignore to match all deploy*.sh variants
Co-authored-by: kwanLeeFrmVi <quanle96@outlook.com>
Closes#433
Made-with: Cursor
The BaseExecutor's buildUrl() and buildHeaders() methods only handled
openai-compatible-* providers but not anthropic-compatible-* providers.
This caused Anthropic-compatible synthetic providers to fail API testing
by hitting the wrong endpoint (returning documentation instead of valid
API responses) and using incorrect auth headers.
Changes:
- Added buildUrl() handling for anthropic-compatible-* providers
to append /messages path
- Added buildHeaders() handling for x-api-key header and
anthropic-version for anthropic-compatible providers
Fixes #XXX
Co-authored-by: Bitgineer <bitgineer@bitgineer.shop>
The github provider in open-sse/config/providers.js was missing clientId,
causing refreshGitHubToken() to send client_id=undefined on 401 retry.
Also guard against undefined clientSecret in both refresh implementations.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Apply fix from PR #354 by @tannk4w to properly signal tool_calls finish_reason
when model emits tool calls, allowing OpenAI-compatible clients to continue with
tool result processing instead of stopping prematurely.
Refactored finish_reason logic into computeFinishReason() helper to eliminate
duplication and improve maintainability across flush and completion paths.
Co-authored-by: tannk4w <tannk@tmi-soft.vn>
Thanks to @tannk4w, @trungtq2799, @quanhavn, and @East-rayyy for the thorough
review and improvement suggestions on the original PR.
Made-with: Cursor
Adds OpenCode (https://github.com/opencode-ai/opencode) as a supported
provider. OpenCode is an open-source terminal AI coding assistant with
an OpenAI-compatible API running locally.
Changes:
- open-sse/config/providers.js: add opencode baseUrl (localhost:4096)
with openai format (fully compatible, no custom headers needed)
- open-sse/services/model.js: add 'oc' alias → opencode
- src/shared/constants/providers.js: add opencode to subscription
providers with alias 'oc', icon 'terminal', color #E87040
Usage after setup: use model prefix 'oc/<model>' to route through
a running OpenCode instance (e.g. oc/claude-sonnet-4-5).
Closes#378
When using SA JSON + Bearer token, Vertex AI requires a project-scoped URL.
The old code used the global publishers endpoint which only works with a raw API key,
causing RESOURCE_PROJECT_INVALID errors.
Changes in open-sse/executors/vertex.js buildUrl():
- SA JSON path: projects/{projectId}/locations/{location}/publishers/google/models/{model}:{action}
- Appends ?alt=sse for streaming on SA JSON path
- Location defaults to us-central1, overridable via providerSpecificData.location
- Raw API key path unchanged (global publishers + ?key= param)
Co-authored-by: anuragg-saxenaa <anuragg.saxenaa@gmail.com>
Made-with: Cursor
Previously only base64 data: URLs were handled in the OpenAI-to-Claude
and OpenAI-to-Gemini request translators. HTTP/HTTPS image URLs were
silently dropped, causing vision-capable models to respond with
"I don't see any image."
Add stream_options: { include_usage: true } to iFlow streaming requests
to get token usage data in the final streaming chunk. This fixes token
counts showing as 0 for iFlow streaming requests.
Only injected when streaming is enabled and body.messages exists (OpenAI
format), and the client hasn't already set stream_options.
Note: Applied only to iFlow executor instead of BaseExecutor to avoid
affecting all providers globally. This gives us more control and allows
testing with iFlow first.
Fixes#74
Co-authored-by: Ibrahim Ryan <ryan@nuevanext.com>
Made-with: Cursor
- Add comboRotationState Map to track rotation per combo
- Add getRotatedModels() to rotate model order based on strategy
- Pass comboName and comboStrategy to handleComboChat()
- Add comboStrategy setting (default: fallback)
- Add UI toggle for Combo Round Robin in profile settings
When enabled, each request to a combo starts with a different provider
instead of always starting with the first one, distributing load evenly.
Co-authored-by: Antigravity Agent <antigravity@example.com>
Some upstream providers (e.g. Antigravity) return non-standard finish_reason
values like 'other' instead of the OpenAI-standard 'tool_calls' when the
model invokes tools. This causes downstream consumers (e.g. OpenClaw) to
fail to execute tool calls, breaking agentic sub-agent workflows.
Changes:
- nonStreamingHandler: post-translation guard that normalizes finish_reason
to 'tool_calls' when message.tool_calls is present
- sseToJsonHandler: accumulate tool_calls from streaming deltas in
parseSSEToOpenAIResponse; extract function_call items from Responses API
output in handleForcedSSEToJson
- openai-responses translator: use toolCallIndex to choose between
'tool_calls' and 'stop' in flush and response.completed events
Tested: 7 scenarios (non-stream text, single/multiple tool calls, stream
text/tool calls, multi-turn tool conversation, tools present but unused)
Kiro returns HTTP 400 with 'Improperly formed request (reset after Xs)'
when a model is not available on that account's subscription tier.
Previously this fell through to COOLDOWN_MS.transient (30s), causing
rapid retries on all accounts before failing — all accounts get locked
simultaneously with no actual fallback.
Treating this as paymentRequired (2min cooldown) ensures:
1. The model is locked on that account for 2min (proper cooldown)
2. The next available account is tried immediately
3. If all accounts hit the same 400, 9Router falls through to the
next provider in the combo
Fixes#384
Root cause: Codex/OpenAI Responses streams multiple alternating reasoning and
message output items. The first message block often has empty output_text; the
visible answer lives in a later message. Previous code used output.find() which
always picked the first (empty) message block.
Fix: walk message items from end and use the last message whose extracted text
is non-empty; fall back to final message if all are empty.
Note: Removed debug logging code from original PR #383 to keep implementation clean.
Co-authored-by: lokinh <locnh@uniultra.xyz>
Made-with: Cursor
- fixes#335: on transient 503/502/504, wait for short cooldown (up to
5s) before falling to next combo model, giving the provider a chance
to recover rather than immediately skipping it
- fixes#334: when all combo models have no active credentials, return
503 (Service Unavailable) instead of 406 (Not Acceptable), which is
more accurate and retriable by clients