Meeting & Recording Ingestion
How recordings become task summaries and planning inputs.
PlanToCode can analyze meeting recordings and screen captures with the video analysis job. The model is guided by a system prompt that adapts to your goal, whether you are debugging, reviewing UI, or documenting a workflow.
Recording ingestion flow
How recordings flow through upload and analysis.
Supported Inputs
The ingestion workflow accepts video recordings captured in the app or uploaded from other tools.
- Screen recordings captured in the desktop app
- Meeting recordings exported from Zoom, Meet, or Teams (video files)
- Design walkthroughs or bug reproductions recorded as video
- For audio-only notes, use voice transcription
Upload Process
Recordings are uploaded to the server as multipart form data for analysis.
Processing Steps
- Desktop saves the recording locally and calculates duration
- Video file and analysis prompt are uploaded to /api/llm/video/analyze
- Server stores the file temporarily and routes it to Gemini video models
- Long recordings are split into 2-minute chunks by the desktop and processed in parallel
- Analysis summary is returned and stored in the job response
Format Normalization
Recordings are sent mostly as-is. WebM recordings are remuxed to fix container metadata before analysis.
No separate transcript or frame artifacts are generated; the output is a text analysis summary.
Multimodal Analysis
Recordings are analyzed with google/* video models, which accept video and audio in a single request.
The default video analysis system prompt adapts the output to your goal rather than forcing a fixed schema.
Audio context
Audio is analyzed as part of the video; the app does not generate a standalone transcript.
If spoken content is unclear, the model may mark it as partially visible rather than guessing.
Audio analysis notes
- Narration steers the summary
- Spoken intent and errors can be quoted
- No diarization or timestamped transcript
Frame rate hint
FPS is a sampling hint sent with the analysis request. For large files the provider may ignore it.
Long recordings can be chunked to keep analysis responsive.
Structured Extraction
The analysis output is freeform and adapts to your prompt. Typical outputs include:
Extracted Elements
- Bug reproduction steps and observed errors
- UI walkthrough notes and navigation paths
- Design feedback or UX issues shown on screen
- Suggested fixes or follow-up tasks
Analysis Artifacts
Video analysis produces artifacts stored with the job:
- analysis_summary: Text summary stored in background_jobs.response
- job_metadata: durationMs, framerate, videoPath
- chunk_info: chunk boundaries for long recordings (when applicable)
Key Source Files
desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsxdesktop/src/contexts/screen-recording/Provider.tsxdesktop/src-tauri/src/jobs/processors/video_analysis_processor.rsserver/src/handlers/proxy/specialized/video_analysis.rsserver/src/utils/multipart_utils.rsserver/src/clients/google_client.rs
Planning Handoff
Video analysis summaries can be incorporated into the task description for planning.
The summary can be refined with text_improvement and task_refinement before file discovery.
Continue to video analysis
Learn more about how video frames are analyzed.