Video Analysis
Adaptive analysis and prompts for screen recordings.
Video analysis sends the recording to Gemini video models with a system prompt that adapts to your goal. The output is a text summary, not a frame-by-frame export or separate transcript.
Video analysis pipeline
How recordings flow through the analysis model.

API Endpoint
Video analysis is handled by /api/llm/video/analyze on the server. The endpoint accepts multipart form data with the video file and analysis parameters.
Payload Fields
- video: The video file
- model: Model identifier for analysis (google/* required)
- prompt: Task description and optional focus prompt (wrapped in <description> and <video_attention_prompt>)
- temperature: Sampling temperature from task settings
- durationMs: Recording duration in milliseconds
- framerate: Sampling hint (0.1-20 from the UI)
- systemPrompt: Composed system prompt (server-generated)
Supported Input Formats
- MP4, WebM, MOV, and AVI are common inputs
- Large files may be uploaded with the provider File API
- Long recordings are chunked by the desktop app before analysis
Frame rate hint
FPS is a hint for how densely to sample the video. For large files the provider may ignore it; for long recordings the desktop may downsample when chunking.
Default recorder rate is 5 FPS. Lower rates reduce cost but may miss rapid UI changes.
Sampling Parameters
- framerate: 0.1-20 selection in the UI (provider requests are clamped to 1-20)
- chunking: long recordings split into 2-minute segments
- audio: include narration when "Include dictation" is enabled
Model Requirements
Video analysis requires Gemini video models. Model identifiers follow provider/model format; only google/* models are supported.
The server restricts video analysis to Google Gemini models that accept video inputs.
Analysis Process
The model analyzes the full video (and audio if present) and produces a goal-oriented summary.
The default system prompt (default_video_analysis) tells the model to adapt to your goal, quote visible text when relevant, and mark unclear content instead of guessing.
Prompt Elements
- Goal alignment: focus on the user's stated intent
- Evidence: quote visible errors, logs, or UI text when relevant
- Sequence: describe the order of events or steps shown
- Next steps: suggest fixes or follow-up tasks
Analysis Outputs
- Analysis summary text tailored to the prompt
- Quoted errors or UI text when visible
- Workflow notes describing what happened on screen
- Suggested fixes or follow-up tasks
Token Usage & Billing
Video analysis usage and cost are tracked per job using provider-reported tokens or duration-based estimates.
- tokens_sent: Prompt + video tokens
- tokens_received: Analysis response tokens
- actual_cost: Computed from model pricing
Result Storage
Analysis results are stored in background_jobs.response with task_type "video_analysis". Long recordings may include chunk metadata.
Results can be incorporated into task descriptions or used directly in the planning workflow.
Key Source Files
desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsxdesktop/src/contexts/screen-recording/Provider.tsxdesktop/src-tauri/src/jobs/processors/video_analysis_processor.rsserver/src/handlers/proxy/specialized/video_analysis.rsserver/src/clients/google_client.rs
Integration with Planning
Video analysis summaries can be appended to the task description for context-aware planning.
Use text_improvement or task_refinement to polish the summary before file discovery.
See meeting ingestion
Learn more about how video analysis works.