プロバイダールーティングとストリーミング
正規化、ストリーミング、使用量追跡を備えたすべての外部LLMリクエストを仲介するルーティングレイヤー。
Planning, analysis, and transcription jobs call external providers through /api/llm and /api/audio endpoints. The routing service normalizes requests via provider_transformers, streams responses through ModernStreamHandler, and records usage metadata per job.
プロバイダールーティングマップ
デスクトップアプリからプロキシ経由でプロバイダーへリクエストがフローする図。
Why a Routing Layer Exists
Direct calls from the desktop client would embed provider credentials and require different payloads per provider. The routing layer keeps keys on the server, exposes a single request format, and maintains consistent streaming behavior.
Security Benefits
- • API keys never leave the server
- • Per-user rate limiting and quotas
- • Request validation before provider calls
Operational Benefits
- • Single request format for all providers
- • Centralized usage tracking and billing
- • Fallback to OpenRouter on failure
Supported Providers
All requests go through a single endpoint: /api/llm/chat/completions. The router determines the appropriate provider based on the model ID in the request payload. Each provider has dedicated handlers in server/src/handlers/proxy/.
| Provider | Routing | Models |
|---|---|---|
| OpenAI | Direct | GPT-5.2, GPT-5.2-Pro, GPT-5-mini, o3, GPT-4o-transcribe |
| Anthropic | Direct (non-streaming), OpenRouter (streaming) | Claude Opus 4.5, Claude Sonnet 4.5 |
| Direct | Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro | |
| X.AI | Direct | Grok-4 |
| DeepSeek | Via OpenRouter | DeepSeek-R1 |
| OpenRouter | Direct | Fallback aggregator for all providers |
Request Normalization via provider_transformers
Job processors submit a normalized payload with task ID, job ID, prompt content, and model selection. Theprovider_transformers module maps that payload into provider-specific request shapes.
// Normalized request from desktop
{
"model": "anthropic/claude-opus-4-5-20251101",
"messages": [
{ "role": "system", "content": "..." },
{ "role": "user", "content": "..." }
],
"max_tokens": 16384,
"temperature": 0.7,
"stream": true,
"metadata": {
"job_id": "uuid-...",
"session_id": "uuid-...",
"task_type": "implementation_plan"
}
}
// Transformed for Anthropic
{
"model": "claude-opus-4-5-20251101",
"system": "...",
"messages": [{ "role": "user", "content": "..." }],
"max_tokens": 16384,
"stream": true
}Transformation Features
- • System message extraction for Anthropic API format
- • Vision payload validation for image models
- • Token limit enforcement based on model context window
- • Provider-specific parameter mapping (top_p, presence_penalty, etc.)
Streaming via ModernStreamHandler
Responses are streamed back to the desktop client through ModernStreamHandler, enabling real-time UI updates and progressive plan rendering.
// ModernStreamHandler processing loop
async fn handle_stream(
response: Response,
job_id: &str,
) -> Result<StreamResult> {
let mut stream = response.bytes_stream();
let mut accumulated = String::new();
while let Some(chunk) = stream.next().await {
let text = parse_sse_chunk(&chunk?)?;
accumulated.push_str(&text);
// Emit event to desktop client
emit_stream_event(job_id, StreamEvent::Chunk {
content: text,
accumulated_tokens: count_tokens(&accumulated),
});
}
// Final usage from provider response
let usage = extract_final_usage(&accumulated)?;
Ok(StreamResult { content: accumulated, usage })
}Chunk Events
Token/chunk events forwarded to job listeners for live UI updates
Partial Artifacts
Partial summaries written to job artifacts during streaming
Completion Events
Final events close the job state with usage metadata
Fallback to OpenRouter on Failure
When a primary provider fails (rate limit, outage, or error), the routing layer can automatically retry through OpenRouter as a fallback aggregator. This provides resilience without requiring user intervention.
Fallback Behavior
- • Primary provider failure triggers OpenRouter retry
- • Model mapping ensures equivalent capabilities
- • Usage tracked separately for cost attribution
- • User notified of fallback in job metadata
Token Counting and Cost Calculation
Every request records usage metadata so teams can audit cost and performance. Token counts come from provider responses when available, with fallback to tiktoken-based estimation.
// Usage record stored per request
{
"tokens_input": 4521,
"tokens_output": 2847,
"cache_read_tokens": 1200, // Anthropic prompt caching
"cache_write_tokens": 0,
"cost": 0.0234, // USD based on model pricing
"service_name": "anthropic/claude-opus-4-5-20251101",
"request_id": "550e8400-e29b-41d4-a716-446655440000" // Server-generated UUID
}Tracked Usage Fields
tokens_inputPrompt tokens consumed by the requesttokens_outputCompletion tokens generated in responsecache_read_tokensTokens served from provider cache (Anthropic)cache_write_tokensTokens written to provider cachecostComputed cost based on model pricingservice_nameModel identifier used for the request (e.g., anthropic/claude-opus-4-5)request_idServer-generated UUID for request trackingVision Validation for Image Models
Requests containing images are validated before routing to ensure the selected model supports vision capabilities. Invalid requests fail fast with clear error messages.
Validation Checks
- • Model supports vision (checked against config)
- • Image format is supported (JPEG, PNG, WebP, GIF)
- • Image size within provider limits
- • Base64 encoding is valid
Vision-Capable Models
- • GPT-5.2, GPT-5-mini
- • Claude Opus 4.5, Claude Sonnet 4.5
- • Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro
- • Grok-4
Failure Handling
If a provider fails or no provider is configured, the job is marked failed and the error payload is stored. Users can retry or run the job with another model instead of relying on silent fallbacks.
Rate Limit Errors
Retry-After header respected, user notified of wait time
Authentication Errors
API key validation failed, check provider configuration
Context Length Errors
Prompt exceeds model limit, suggest smaller context or different model
Security Boundaries
API keys stay in the server configuration. The desktop client only receives allowed model lists and never embeds provider credentials.
Security Measures
- • Key Storage: Provider keys stored in encrypted vault, never sent to clients
- • Request Signing: All proxy requests include server-signed JWT for authentication
- • Content Filtering: Optional content moderation before sending to providers
- • Audit Logging: All requests logged with user context for compliance
Building a Similar Proxy (Conceptual)
If you are building a similar architecture, the key components to implement are:
- Model-based routing: Look up the model ID to determine which provider to use, then route internally
- Request transformation: Convert normalized requests to provider-specific formats (e.g., extract system messages for Anthropic)
- Streaming handlers: Process SSE chunks from providers and forward to clients with consistent event format
- Usage tracking: Record input/output tokens, cache usage, and costs per request with server-generated request IDs
- Fallback routing: Route certain providers through aggregators (e.g., Anthropic streaming via OpenRouter)
Implementation Note
The actual implementation uses Actix-web handlers with provider-specific modules in server/src/handlers/proxy/providers/. See router.rs for the main routing logic.