* feat: add AI request details feature with latency tracking
Add comprehensive request history and debugging capability to the Usage dashboard:
**Storage Layer** (usageDb.js):
- Add saveRequestDetail() for storing full request/response details
- Implement FIFO queue with 1000-record limit in request-details.json
- Auto-sanitize sensitive headers (authorization, api-key, cookie, token)
- Add getRequestDetails() with pagination and filtering support
- Add getRequestDetailById() for single record lookup
**Pipeline Integration** (chatCore.js):
- Track request start time and calculate total latency
- Record TTFT (Time To First Token) and total latency for all requests
- Capture full request details (messages, model, parameters)
- Save response content for non-streaming, mark streaming responses
- Handle error cases with detailed error information
- Async non-blocking saves to avoid impacting request performance
**API Layer** (/api/usage/request-details):
- GET endpoint with pagination (page, pageSize: 1-100)
- Filter by provider, model, connectionId, status, date range
- Returns { details: [...], pagination: {...} } format
**UI Components**:
- Drawer.js: Right slide-out panel with backdrop blur and ESC close
- Pagination.js: Full pagination with page size selector (10/20/50)
- RequestDetailsTab.js: Complete table view with filters and detail drawer
**Dashboard Integration**:
- Add "Details" tab to Usage page (4th tab after Overview/Logger/Limits)
- Table columns: Timestamp, Model, Provider, Input Tokens, Output Tokens, Latency (TTFT/Total), Action
- Provider filter dropdown (9 providers supported)
- Date range filters (start/end datetime)
- Click "Detail" button to view full request/response JSON in slide-out drawer
**Features**:
- Real-time latency monitoring (TTFT & Total)
- Complete request/response inspection for debugging
- Filterable and searchable request history
- Responsive design with mobile-friendly filters
- Data security with automatic header sanitization
- Performance: async saves don't block request pipeline
**Files Created/Modified**:
- src/lib/usageDb.js (modified)
- open-sse/handlers/chatCore.js (modified)
- src/app/api/usage/request-details/route.js (new)
- src/shared/components/Drawer.js (new)
- src/shared/components/Pagination.js (new)
- src/app/(dashboard)/dashboard/usage/components/RequestDetailsTab.js (new)
- src/app/(dashboard)/dashboard/usage/page.js (modified)
Closes: AI Observability Dashboard feature
* feat: enhance request details with full config and streaming content capture
Improve Request Details feature to capture comprehensive request parameters
and actual streaming response content:
**Request Configuration Enhancement** (chatCore.js):
- Add extractRequestConfig() helper function to capture all request parameters
- Include temperature controls: temperature, top_p, top_k
- Include token limits: max_tokens, max_completion_tokens
- Include thinking/reasoning modes: thinking, reasoning, enable_thinking
- Include OpenAI parameters: presence_penalty, frequency_penalty, seed, stop,
tools, tool_choice, response_format, n, logprobs, top_logprobs, logit_bias,
user, parallel_tool_calls, prediction, store, metadata
- Apply to all request types: non-streaming, streaming, and error cases
**Streaming Content Capture** (chatCore.js & stream.js):
- Add onStreamComplete callback mechanism to stream processors
- Accumulate content from all formats: OpenAI, Claude, Gemini
- Track content from delta.content, delta.reasoning_content, delta.text,
delta.thinking, and Gemini content.parts
- Save initial record with "[Streaming in progress...]" marker
- Update record with actual content when stream completes
- Include usage tokens when available from stream
**Files Modified**:
- open-sse/handlers/chatCore.js - extractRequestConfig() + streaming capture
- open-sse/utils/stream.js - onStreamComplete callback + content accumulation
**Benefits**:
- View complete request configuration in Request Details (thinking mode, etc.)
- See actual streaming response content instead of placeholder
- Better debugging and observability for AI requests
Refs: #request-details-enhancement
* feat: separate thinking/reasoning content from response content
Improve Request Details to display thinking process separately from final response:
**Backend Changes**:
- stream.js: Capture content and thinking separately in streaming mode
- Add accumulatedThinking variable alongside accumulatedContent
- Route delta.content to content, delta.reasoning_content to thinking
- Support OpenAI (reasoning_content), Claude (thinking), Gemini (part.thought)
- Update onStreamComplete callback to return { content, thinking } object
- chatCore.js: Update response structure to include thinking field
- Non-streaming: Extract thinking from reasoning_content field
- Streaming: Receive { content, thinking } from stream callback
- Error responses: Include thinking: null
- Initial streaming save: Include thinking: null
**Frontend Changes**:
- RequestDetailsTab.js: Display thinking and content in separate sections
- Add amber/yellow themed "Thinking Process" section with psychology icon
- Show "Final Response" label when thinking is present
- Use distinct visual styling for thinking (amber bg) vs content (gray bg)
- Only show thinking section when thinking content exists
**Benefits**:
- Users can clearly see model's reasoning process vs final answer
- Better debugging for models with thinking capabilities (Claude, o1, etc.)
- Visual distinction makes it easy to identify thinking vs response
Refs: #thinking-content-separation
* fix: map Claude thinking to reasoning_content field
Fix Claude thinking content to be properly captured as reasoning_content
instead of regular content, enabling separate display in Request Details:
**Changes**:
- claude-to-openai.js: Use reasoning_content field for thinking blocks
- thinking start: send { reasoning_content: "" } instead of { content: "```\n```" }
- thinking delta: map to reasoning_content instead of content
- thinking stop: send { reasoning_content: "" } instead of { content: "```\n```" }
**Why This Matters**:
- Previously Claude thinking was sent as `content` field, mixed with actual response
- Now thinking uses `reasoning_content` field, matching OpenAI's o1 format
- stream.js can now properly route thinking to accumulatedThinking variable
- Request Details UI will show Claude thinking in separate "Thinking Process" section
**Supported Thinking Formats**:
- OpenAI: delta.reasoning_content → thinking
- Claude: delta.thinking → reasoning_content (now fixed)
- Gemini: part.thought === true → thinking
Refs: #claude-thinking-fix
* feat(observability): capture and display full 4-layer request chain
Capture complete request/response chain in AI Request Details:
- Add providerRequest field (translated request sent to provider)
- Add providerResponse field (raw provider response, streaming indicator)
- Update chatCore.js at all 5 saveRequestDetail() call sites
- Reorganize UI into 4 collapsible sections with Material icons
- Preserve backward compatibility for old records
- Add distinct styling for streaming indicator
* fix(observability): resolve React duplicate key warning in request details table
- Use composite key (detail.id + index) to ensure unique keys
- Prevents React warnings when database contains duplicate IDs from old ID generation
* fix(observability): display actual content in streaming request details
Change providerResponse field for streaming requests from placeholder
"[Streaming - raw response not captured]" to actual final content.
This improves debugging experience by showing the real AI response
in the "Provider Response (Raw)" section instead of a confusing
placeholder message.
Files changed:
- open-sse/handlers/chatCore.js: Save contentObj.content to providerResponse
- src/app/.../RequestDetailsTab.js: Remove special handling for placeholder
* refactor(observability): migrate request details to SQLite for improved concurrency
- Replace LowDB JSON storage with better-sqlite3
- Enable WAL mode for true concurrent read/write support
- Add 5 indexes to accelerate queries (timestamp, provider, model, connection_id, status)
- Perform pagination at the database level to reduce memory footprint
- Maintain 1000 record limit with automatic cleanup of old data
- Ensure API compatibility via re-exports, requiring no caller changes
Performance improvements:
- Concurrent Writes: Lock-free WAL mode prevents data contention
- Query Efficiency: Index-based searches replace full dataset loading
- Data Integrity: Atomic operations prevent file corruption
* fix(observability): resolve pagination statistics display issues
- Fix issue where totalItems=0 showed 'Showing 1 to 0 of 0 results'
- Hide pagination controls when totalItems=0 or totalPages<=1
- Standardize API response fields: pagination.total -> pagination.totalItems
Before: Incorrect stats shown for empty data, and pager visible even for single-page results
After: Stats hidden for empty data, pager hidden when navigation is unnecessary
* feat(observability): display friendly provider names in request details
- Add /api/usage/providers endpoint to dynamically fetch provider list with names
- Replace hardcoded provider options with dynamic loading from database
- Display friendly provider names instead of IDs in both table and detail drawer
- Support custom provider nodes (e.g., OpenAI-compatible) with user-defined names
- Add provider name caching to optimize performance
* fix(observability): use INSERT OR REPLACE for request details to handle streaming updates
* fix(observability): resolve zero-token display issue by ensuring streaming usage capture and fixing key mismatch
* fix(observability): separate TTFT and total latency calculation for streaming requests
* feat(observability): implement SQLite write queue and JSON size limits
- Added in-memory buffer and batch writing for SQLite to prevent lock contention
- Implemented with configurable 1MB limit to prevent DB bloat
- Added dashboard UI for observability performance and data management settings
- Integrated graceful shutdown handlers to prevent data loss
* fix(observability): resolve ReferenceError by declaring dbInstance
323 lines
10 KiB
JavaScript
323 lines
10 KiB
JavaScript
/**
|
|
* Token Usage Tracking - Extract, normalize, estimate and log token usage
|
|
*/
|
|
|
|
import { saveRequestUsage, appendRequestLog } from "@/lib/usageDb.js";
|
|
import { FORMATS } from "../translator/formats.js";
|
|
|
|
// ANSI color codes
|
|
export const COLORS = {
|
|
reset: "\x1b[0m",
|
|
red: "\x1b[31m",
|
|
green: "\x1b[32m",
|
|
yellow: "\x1b[33m",
|
|
blue: "\x1b[34m",
|
|
cyan: "\x1b[36m"
|
|
};
|
|
|
|
// Buffer tokens to prevent context errors
|
|
const BUFFER_TOKENS = 2000;
|
|
|
|
// Get HH:MM:SS timestamp
|
|
function getTimeString() {
|
|
return new Date().toLocaleTimeString("en-US", { hour12: false, hour: "2-digit", minute: "2-digit", second: "2-digit" });
|
|
}
|
|
|
|
/**
|
|
* Add buffer tokens to usage to prevent context errors
|
|
* @param {object} usage - Usage object (any format)
|
|
* @returns {object} Usage with buffer added
|
|
*/
|
|
export function addBufferToUsage(usage) {
|
|
if (!usage || typeof usage !== "object") return usage;
|
|
|
|
const result = { ...usage };
|
|
|
|
// Claude format
|
|
if (result.input_tokens !== undefined) {
|
|
result.input_tokens += BUFFER_TOKENS;
|
|
}
|
|
|
|
// OpenAI format
|
|
if (result.prompt_tokens !== undefined) {
|
|
result.prompt_tokens += BUFFER_TOKENS;
|
|
}
|
|
|
|
// Calculate or update total_tokens
|
|
if (result.total_tokens !== undefined) {
|
|
result.total_tokens += BUFFER_TOKENS;
|
|
} else if (result.prompt_tokens !== undefined && result.completion_tokens !== undefined) {
|
|
// Calculate total_tokens if not exists
|
|
result.total_tokens = result.prompt_tokens + result.completion_tokens;
|
|
}
|
|
|
|
return result;
|
|
}
|
|
|
|
export function filterUsageForFormat(usage, targetFormat) {
|
|
if (!usage || typeof usage !== "object") return usage;
|
|
|
|
// Helper to pick only defined fields from usage
|
|
const pickFields = (fields) => {
|
|
const filtered = {};
|
|
for (const field of fields) {
|
|
if (usage[field] !== undefined) {
|
|
filtered[field] = usage[field];
|
|
}
|
|
}
|
|
return filtered;
|
|
};
|
|
|
|
// Define allowed fields for each format
|
|
const formatFields = {
|
|
[FORMATS.CLAUDE]: [
|
|
'input_tokens', 'output_tokens',
|
|
'cache_read_input_tokens', 'cache_creation_input_tokens',
|
|
'estimated'
|
|
],
|
|
[FORMATS.GEMINI]: [
|
|
'promptTokenCount', 'candidatesTokenCount', 'totalTokenCount',
|
|
'cachedContentTokenCount', 'thoughtsTokenCount',
|
|
'estimated'
|
|
],
|
|
[FORMATS.OPENAI_RESPONSES]: [
|
|
'input_tokens', 'output_tokens',
|
|
'input_tokens_details', 'output_tokens_details',
|
|
'estimated'
|
|
],
|
|
// OpenAI format (default for OPENAI, CODEX, KIRO, etc.)
|
|
default: [
|
|
'prompt_tokens', 'completion_tokens', 'total_tokens',
|
|
'cached_tokens', 'reasoning_tokens',
|
|
'prompt_tokens_details', 'completion_tokens_details',
|
|
'estimated'
|
|
]
|
|
};
|
|
|
|
// Get fields for target format
|
|
let fields = formatFields[targetFormat];
|
|
|
|
// Use same fields for similar formats
|
|
if (targetFormat === FORMATS.GEMINI_CLI || targetFormat === FORMATS.ANTIGRAVITY) {
|
|
fields = formatFields[FORMATS.GEMINI];
|
|
} else if (targetFormat === FORMATS.OPENAI_RESPONSE) {
|
|
fields = formatFields[FORMATS.OPENAI_RESPONSES];
|
|
} else if (!fields) {
|
|
fields = formatFields.default;
|
|
}
|
|
|
|
return pickFields(fields);
|
|
}
|
|
|
|
/**
|
|
* Normalize usage object - ensure all values are valid numbers
|
|
*/
|
|
export function normalizeUsage(usage) {
|
|
if (!usage || typeof usage !== "object" || Array.isArray(usage)) return null;
|
|
|
|
const normalized = {};
|
|
const assignNumber = (key, value) => {
|
|
if (value === undefined || value === null) return;
|
|
const numeric = Number(value);
|
|
if (Number.isFinite(numeric)) normalized[key] = numeric;
|
|
};
|
|
|
|
assignNumber("prompt_tokens", usage?.prompt_tokens);
|
|
assignNumber("completion_tokens", usage?.completion_tokens);
|
|
assignNumber("total_tokens", usage?.total_tokens);
|
|
assignNumber("cache_read_input_tokens", usage?.cache_read_input_tokens);
|
|
assignNumber("cache_creation_input_tokens", usage?.cache_creation_input_tokens);
|
|
assignNumber("cached_tokens", usage?.cached_tokens);
|
|
assignNumber("reasoning_tokens", usage?.reasoning_tokens);
|
|
|
|
if (Object.keys(normalized).length === 0) return null;
|
|
return normalized;
|
|
}
|
|
|
|
/**
|
|
* Check if usage has valid token data
|
|
* Valid = has at least one token field with value > 0
|
|
* Invalid = empty object {}, null, undefined, no token fields, or all zeros
|
|
*/
|
|
export function hasValidUsage(usage) {
|
|
if (!usage || typeof usage !== "object") return false;
|
|
|
|
// Check for any known token field with value > 0
|
|
const tokenFields = [
|
|
"prompt_tokens", "completion_tokens", "total_tokens", // OpenAI
|
|
"input_tokens", "output_tokens", // Claude
|
|
"promptTokenCount", "candidatesTokenCount" // Gemini
|
|
];
|
|
|
|
for (const field of tokenFields) {
|
|
if (typeof usage[field] === "number" && usage[field] > 0) {
|
|
return true;
|
|
}
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
/**
|
|
* Extract usage from any format (Claude, OpenAI, Gemini, Responses API)
|
|
*/
|
|
export function extractUsage(chunk) {
|
|
if (!chunk || typeof chunk !== "object") return null;
|
|
|
|
// Claude format (message_delta event)
|
|
if (chunk.type === "message_delta" && chunk.usage && typeof chunk.usage === "object") {
|
|
return normalizeUsage({
|
|
prompt_tokens: chunk.usage.input_tokens || 0,
|
|
completion_tokens: chunk.usage.output_tokens || 0,
|
|
cache_read_input_tokens: chunk.usage.cache_read_input_tokens,
|
|
cache_creation_input_tokens: chunk.usage.cache_creation_input_tokens
|
|
});
|
|
}
|
|
|
|
// OpenAI Responses API format (response.completed or response.done)
|
|
if ((chunk.type === "response.completed" || chunk.type === "response.done") && chunk.response?.usage && typeof chunk.response.usage === "object") {
|
|
const usage = chunk.response.usage;
|
|
return normalizeUsage({
|
|
prompt_tokens: usage.input_tokens || usage.prompt_tokens || 0,
|
|
completion_tokens: usage.output_tokens || usage.completion_tokens || 0,
|
|
cached_tokens: usage.input_tokens_details?.cached_tokens,
|
|
reasoning_tokens: usage.output_tokens_details?.reasoning_tokens
|
|
});
|
|
}
|
|
|
|
// OpenAI format
|
|
if (chunk.usage && typeof chunk.usage === "object" && chunk.usage.prompt_tokens !== undefined) {
|
|
return normalizeUsage({
|
|
prompt_tokens: chunk.usage.prompt_tokens,
|
|
completion_tokens: chunk.usage.completion_tokens || 0,
|
|
cached_tokens: chunk.usage.prompt_tokens_details?.cached_tokens,
|
|
reasoning_tokens: chunk.usage.completion_tokens_details?.reasoning_tokens
|
|
});
|
|
}
|
|
|
|
// Gemini format (Antigravity)
|
|
if (chunk.usageMetadata && typeof chunk.usageMetadata === "object") {
|
|
return normalizeUsage({
|
|
prompt_tokens: chunk.usageMetadata?.promptTokenCount || 0,
|
|
completion_tokens: chunk.usageMetadata?.candidatesTokenCount || 0,
|
|
total_tokens: chunk.usageMetadata?.totalTokenCount,
|
|
cached_tokens: chunk.usageMetadata?.cachedContentTokenCount,
|
|
reasoning_tokens: chunk.usageMetadata?.thoughtsTokenCount
|
|
});
|
|
}
|
|
|
|
return null;
|
|
}
|
|
|
|
/**
|
|
* Estimate input tokens from request body
|
|
* Calculate total body size for more accurate estimation
|
|
*/
|
|
export function estimateInputTokens(body) {
|
|
if (!body || typeof body !== "object") return 0;
|
|
|
|
try {
|
|
// Calculate total body size (includes messages, tools, system, thinking config, etc.)
|
|
const bodyStr = JSON.stringify(body);
|
|
const totalChars = bodyStr.length;
|
|
|
|
// Estimate: ~4 chars per token (rough average across all tokenizers)
|
|
return Math.ceil(totalChars / 4);
|
|
} catch (err) {
|
|
// Fallback if stringify fails
|
|
return 0;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Estimate output tokens from content length
|
|
*/
|
|
export function estimateOutputTokens(contentLength) {
|
|
if (!contentLength || contentLength <= 0) return 0;
|
|
return Math.max(1, Math.floor(contentLength / 4));
|
|
}
|
|
|
|
/**
|
|
* Format usage object based on target format
|
|
* @param {number} inputTokens - Input/prompt tokens
|
|
* @param {number} outputTokens - Output/completion tokens
|
|
* @param {string} targetFormat - Target format from FORMATS
|
|
*/
|
|
export function formatUsage(inputTokens, outputTokens, targetFormat) {
|
|
// Claude format uses input_tokens/output_tokens
|
|
if (targetFormat === FORMATS.CLAUDE) {
|
|
return addBufferToUsage({
|
|
input_tokens: inputTokens,
|
|
output_tokens: outputTokens,
|
|
estimated: true
|
|
});
|
|
}
|
|
|
|
// Default: OpenAI format (works for openai, gemini, responses, etc.)
|
|
return addBufferToUsage({
|
|
prompt_tokens: inputTokens,
|
|
completion_tokens: outputTokens,
|
|
total_tokens: inputTokens + outputTokens,
|
|
estimated: true
|
|
});
|
|
}
|
|
|
|
/**
|
|
* Estimate full usage when provider doesn't return it
|
|
* @param {object} body - Request body for input token estimation
|
|
* @param {number} contentLength - Content length for output token estimation
|
|
* @param {string} targetFormat - Target format from FORMATS constant
|
|
*/
|
|
export function estimateUsage(body, contentLength, targetFormat = FORMATS.OPENAI) {
|
|
return formatUsage(
|
|
estimateInputTokens(body),
|
|
estimateOutputTokens(contentLength),
|
|
targetFormat
|
|
);
|
|
}
|
|
|
|
/**
|
|
* Log usage with cache info (green color)
|
|
*/
|
|
export function logUsage(provider, usage, model = null, connectionId = null) {
|
|
if (!usage || typeof usage !== "object") return;
|
|
|
|
const p = provider?.toUpperCase() || "UNKNOWN";
|
|
|
|
// Support both formats:
|
|
// - OpenAI: prompt_tokens, completion_tokens
|
|
// - Claude: input_tokens, output_tokens
|
|
const inTokens = usage?.prompt_tokens || usage?.input_tokens || 0;
|
|
const outTokens = usage?.completion_tokens || usage?.output_tokens || 0;
|
|
const accountPrefix = connectionId ? connectionId.slice(0, 8) + "..." : "unknown";
|
|
|
|
let msg = `[${getTimeString()}] 📊 ${COLORS.green}[USAGE] ${p} | in=${inTokens} | out=${outTokens} | account=${accountPrefix}${COLORS.reset}`;
|
|
|
|
// Add estimated flag if present
|
|
if (usage.estimated) {
|
|
msg += ` ${COLORS.yellow}(estimated)${COLORS.reset}`;
|
|
}
|
|
|
|
// Add cache info if present (unified from different formats)
|
|
const cacheRead = usage.cache_read_input_tokens || usage.cached_tokens;
|
|
if (cacheRead) msg += ` | cache_read=${cacheRead}`;
|
|
|
|
const cacheCreation = usage.cache_creation_input_tokens;
|
|
if (cacheCreation) msg += ` | cache_create=${cacheCreation}`;
|
|
|
|
const reasoning = usage.reasoning_tokens;
|
|
if (reasoning) msg += ` | reasoning=${reasoning}`;
|
|
|
|
console.log(msg);
|
|
|
|
// Save to usage DB
|
|
const tokens = {
|
|
prompt_tokens: inTokens,
|
|
completion_tokens: outTokens,
|
|
cache_read_input_tokens: cacheRead || 0,
|
|
cache_creation_input_tokens: cacheCreation || 0,
|
|
reasoning_tokens: reasoning || 0
|
|
};
|
|
saveRequestUsage({ model, provider, connectionId, tokens }).catch(() => { });
|
|
appendRequestLog({ model, provider, connectionId, tokens, status: "200 OK" }).catch(() => { });
|
|
}
|