Inputs

Meeting & Recording Ingestion

How recordings become task summaries and planning inputs.

25. September 2025

•

8 min Lesezeit

•

PlanToCode can analyze meeting recordings and screen captures with the video analysis job. The model is guided by a system prompt that adapts to your goal, whether you are debugging, reviewing UI, or documenting a workflow.

Recording ingestion flow

How recordings flow through upload and analysis.

Supported Inputs

The ingestion workflow accepts video recordings captured in the app or uploaded from other tools.

Screen recordings captured in the desktop app
Meeting recordings exported from Zoom, Meet, or Teams (video files)
Design walkthroughs or bug reproductions recorded as video
For audio-only notes, use voice transcription

Upload Process

Recordings are uploaded to the server as multipart form data for analysis.

Processing Steps

Desktop saves the recording locally and calculates duration
Video file and analysis prompt are uploaded to /api/llm/video/analyze
Server stores the file temporarily and routes it to Gemini video models
Long recordings are split into 2-minute chunks by the desktop and processed in parallel
Analysis summary is returned and stored in the job response

Format Normalization

Recordings are sent mostly as-is. WebM recordings are remuxed to fix container metadata before analysis.

No separate transcript or frame artifacts are generated; the output is a text analysis summary.

Multimodal Analysis

Recordings are analyzed with google/* video models, which accept video and audio in a single request.

The default video analysis system prompt adapts the output to your goal rather than forcing a fixed schema.

Audio context

Audio is analyzed as part of the video; the app does not generate a standalone transcript.

If spoken content is unclear, the model may mark it as partially visible rather than guessing.

Audio analysis notes

Narration steers the summary
Spoken intent and errors can be quoted
No diarization or timestamped transcript

Frame rate hint

FPS is a sampling hint sent with the analysis request. For large files the provider may ignore it.

Long recordings can be chunked to keep analysis responsive.

Structured Extraction

The analysis output is freeform and adapts to your prompt. Typical outputs include:

Extracted Elements

Bug reproduction steps and observed errors
UI walkthrough notes and navigation paths
Design feedback or UX issues shown on screen
Suggested fixes or follow-up tasks

Analysis Artifacts

Video analysis produces artifacts stored with the job:

analysis_summary: Text summary stored in background_jobs.response
job_metadata: durationMs, framerate, videoPath
chunk_info: chunk boundaries for long recordings (when applicable)

Key Source Files

desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsx
desktop/src/contexts/screen-recording/Provider.tsx
desktop/src-tauri/src/jobs/processors/video_analysis_processor.rs
server/src/handlers/proxy/specialized/video_analysis.rs
server/src/utils/multipart_utils.rs
server/src/clients/google_client.rs

Planning Handoff

Video analysis summaries can be incorporated into the task description for planning.

The summary can be refined with text_improvement and task_refinement before file discovery.

Continue to video analysis

Learn more about how video frames are analyzed.