Inputs

Video Analysis

Adaptive analysis and prompts for screen recordings.

2025年9月25日

•

6 min で読めます

•

Video analysis sends the recording to Gemini video models with a system prompt that adapts to your goal. The output is a text summary, not a frame-by-frame export or separate transcript.

Video analysis pipeline

How recordings flow through the analysis model.

Video analysis interface — The video analysis interface showing analysis options.

API Endpoint

Video analysis is handled by /api/llm/video/analyze on the server. The endpoint accepts multipart form data with the video file and analysis parameters.

Payload Fields

video: The video file
model: Model identifier for analysis (google/* required)
prompt: Task description and optional focus prompt (wrapped in <description> and <video_attention_prompt>)
temperature: Sampling temperature from task settings
durationMs: Recording duration in milliseconds
framerate: Sampling hint (0.1-20 from the UI)
systemPrompt: Composed system prompt (server-generated)

Supported Input Formats

MP4, WebM, MOV, and AVI are common inputs
Large files may be uploaded with the provider File API
Long recordings are chunked by the desktop app before analysis

Frame rate hint

FPS is a hint for how densely to sample the video. For large files the provider may ignore it; for long recordings the desktop may downsample when chunking.

Default recorder rate is 5 FPS. Lower rates reduce cost but may miss rapid UI changes.

Sampling Parameters

framerate: 0.1-20 selection in the UI (provider requests are clamped to 1-20)
chunking: long recordings split into 2-minute segments
audio: include narration when "Include dictation" is enabled

Model Requirements

Video analysis requires Gemini video models. Model identifiers follow provider/model format; only google/* models are supported.

The server restricts video analysis to Google Gemini models that accept video inputs.

Analysis Process

The model analyzes the full video (and audio if present) and produces a goal-oriented summary.

The default system prompt (default_video_analysis) tells the model to adapt to your goal, quote visible text when relevant, and mark unclear content instead of guessing.

Prompt Elements

Goal alignment: focus on the user's stated intent
Evidence: quote visible errors, logs, or UI text when relevant
Sequence: describe the order of events or steps shown
Next steps: suggest fixes or follow-up tasks

Analysis Outputs

Analysis summary text tailored to the prompt
Quoted errors or UI text when visible
Workflow notes describing what happened on screen
Suggested fixes or follow-up tasks

Token Usage & Billing

Video analysis usage and cost are tracked per job using provider-reported tokens or duration-based estimates.

tokens_sent: Prompt + video tokens
tokens_received: Analysis response tokens
actual_cost: Computed from model pricing

Result Storage

Analysis results are stored in background_jobs.response with task_type "video_analysis". Long recordings may include chunk metadata.

Results can be incorporated into task descriptions or used directly in the planning workflow.

Key Source Files

desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsx
desktop/src/contexts/screen-recording/Provider.tsx
desktop/src-tauri/src/jobs/processors/video_analysis_processor.rs
server/src/handlers/proxy/specialized/video_analysis.rs
server/src/clients/google_client.rs

Integration with Planning

Video analysis summaries can be appended to the task description for context-aware planning.

Use text_improvement or task_refinement to polish the summary before file discovery.

See meeting ingestion

Learn more about how video analysis works.