Video Analysis
Frame sampling, prompts, and analysis artifacts from recordings.
Video analysis extracts UI state and action sequences from screen recordings. This enables understanding of user workflows and bug reproduction contexts.
Video analysis pipeline
How frames flow through the analysis model.

API Endpoint
Video analysis is handled by /api/llm/video/analyze on the server. The endpoint accepts multipart form data with the video file and analysis parameters.
Payload Fields
- video: The video file (MP4, WebM, MOV)
- model: Model identifier for analysis
- prompt: Optional custom analysis prompt
- max_frames: Maximum frames to sample
- fps: Frame sampling rate
Supported Input Formats
- MP4 with H.264 or H.265 codec
- WebM with VP8 or VP9 codec
- MOV from screen recording tools
- Maximum file size: 100MB
Frame Sampling
Frames are extracted at configurable intervals to balance coverage and API costs. Lower frame rates reduce token usage but may miss rapid changes.
Default rate is 1 frame per second. For detailed UI analysis, 2-3 FPS may be needed.
Sampling Parameters
- fps: Frames per second to extract (0.5-5)
- max_frames: Maximum total frames (10-100)
- start_time: Offset to begin sampling
- end_time: Offset to stop sampling
Model Requirements
Video analysis requires vision-capable models. Model identifiers follow provider/model format. Currently only google/* models support native video analysis.
Google Gemini models can process video natively, while other vision models require frame-by-frame image analysis.
Analysis Process
Sampled frames are sent to the vision model along with the analysis prompt. The model produces structured observations about UI state and user actions.
System prompts guide the model to focus on specific aspects of the recording.
Prompt Elements
- UI inventory: List visible elements and controls
- Action sequence: Describe user actions in order
- Error detection: Identify error states and messages
- Navigation paths: Track screen transitions
Analysis Outputs
- frame_observations: Per-frame UI descriptions
- action_timeline: Ordered list of user actions
- error_summary: Any errors or issues observed
- context_summary: High-level workflow description
Token Usage & Billing
Video analysis consumes tokens based on frame count and resolution. Each frame is processed as an image token.
- tokens_sent: Prompt + image tokens
- tokens_received: Analysis response tokens
- actual_cost: Computed from model pricing
Result Storage
Analysis results are stored in the background_jobs table with task_type 'video_analysis'. The response contains the full analysis in JSON format.
Results can be incorporated into task descriptions or used directly in the planning workflow.
Key Source Files
server/src/handlers/proxy/video_handler.rsserver/src/services/video_processor.rsdesktop/src/components/video/VideoAnalyzer.tsx
Integration with Planning
Video analysis outputs can feed directly into the task description for context-aware planning.
The context_summary is particularly useful as a starting point for implementation planning.
See meeting ingestion
Learn how video analysis fits into the broader meeting ingestion workflow.