문서로 돌아가기
Inputs

Video Analysis

Adaptive analysis and prompts for screen recordings.

6 min 읽기

Video analysis sends the recording to Gemini video models with a system prompt that adapts to your goal. The output is a text summary, not a frame-by-frame export or separate transcript.

Video analysis pipeline

How recordings flow through the analysis model.

Video analysis interface
Click to expand
The video analysis interface showing analysis options.

API Endpoint

Video analysis is handled by /api/llm/video/analyze on the server. The endpoint accepts multipart form data with the video file and analysis parameters.

Payload Fields

  • video: The video file
  • model: Model identifier for analysis (google/* required)
  • prompt: Task description and optional focus prompt (wrapped in <description> and <video_attention_prompt>)
  • temperature: Sampling temperature from task settings
  • durationMs: Recording duration in milliseconds
  • framerate: Sampling hint (0.1-20 from the UI)
  • systemPrompt: Composed system prompt (server-generated)

Supported Input Formats

  • MP4, WebM, MOV, and AVI are common inputs
  • Large files may be uploaded with the provider File API
  • Long recordings are chunked by the desktop app before analysis

Frame rate hint

FPS is a hint for how densely to sample the video. For large files the provider may ignore it; for long recordings the desktop may downsample when chunking.

Default recorder rate is 5 FPS. Lower rates reduce cost but may miss rapid UI changes.

Sampling Parameters

  • framerate: 0.1-20 selection in the UI (provider requests are clamped to 1-20)
  • chunking: long recordings split into 2-minute segments
  • audio: include narration when "Include dictation" is enabled

Model Requirements

Video analysis requires Gemini video models. Model identifiers follow provider/model format; only google/* models are supported.

The server restricts video analysis to Google Gemini models that accept video inputs.

Analysis Process

The model analyzes the full video (and audio if present) and produces a goal-oriented summary.

The default system prompt (default_video_analysis) tells the model to adapt to your goal, quote visible text when relevant, and mark unclear content instead of guessing.

Prompt Elements

  • Goal alignment: focus on the user's stated intent
  • Evidence: quote visible errors, logs, or UI text when relevant
  • Sequence: describe the order of events or steps shown
  • Next steps: suggest fixes or follow-up tasks

Analysis Outputs

  • Analysis summary text tailored to the prompt
  • Quoted errors or UI text when visible
  • Workflow notes describing what happened on screen
  • Suggested fixes or follow-up tasks

Token Usage & Billing

Video analysis usage and cost are tracked per job using provider-reported tokens or duration-based estimates.

  • tokens_sent: Prompt + video tokens
  • tokens_received: Analysis response tokens
  • actual_cost: Computed from model pricing

Result Storage

Analysis results are stored in background_jobs.response with task_type "video_analysis". Long recordings may include chunk metadata.

Results can be incorporated into task descriptions or used directly in the planning workflow.

Key Source Files

  • desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsx
  • desktop/src/contexts/screen-recording/Provider.tsx
  • desktop/src-tauri/src/jobs/processors/video_analysis_processor.rs
  • server/src/handlers/proxy/specialized/video_analysis.rs
  • server/src/clients/google_client.rs

Integration with Planning

Video analysis summaries can be appended to the task description for context-aware planning.

Use text_improvement or task_refinement to polish the summary before file discovery.

See meeting ingestion

Learn more about how video analysis works.