Volver a Documentación
Inputs

Meeting & Recording Ingestion

How recordings become task summaries and planning inputs.

8 min de lectura

PlanToCode can analyze meeting recordings and screen captures with the video analysis job. The model is guided by a system prompt that adapts to your goal, whether you are debugging, reviewing UI, or documenting a workflow.

Recording ingestion flow

How recordings flow through upload and analysis.

Recording ingestion flow diagram
Click to expand
Placeholder for ingestion flow diagram.

Supported Inputs

The ingestion workflow accepts video recordings captured in the app or uploaded from other tools.

  • Screen recordings captured in the desktop app
  • Meeting recordings exported from Zoom, Meet, or Teams (video files)
  • Design walkthroughs or bug reproductions recorded as video
  • For audio-only notes, use voice transcription

Upload Process

Recordings are uploaded to the server as multipart form data for analysis.

Processing Steps

  1. Desktop saves the recording locally and calculates duration
  2. Video file and analysis prompt are uploaded to /api/llm/video/analyze
  3. Server stores the file temporarily and routes it to Gemini video models
  4. Long recordings are split into 2-minute chunks by the desktop and processed in parallel
  5. Analysis summary is returned and stored in the job response

Format Normalization

Recordings are sent mostly as-is. WebM recordings are remuxed to fix container metadata before analysis.

No separate transcript or frame artifacts are generated; the output is a text analysis summary.

Multimodal Analysis

Recordings are analyzed with google/* video models, which accept video and audio in a single request.

The default video analysis system prompt adapts the output to your goal rather than forcing a fixed schema.

Audio context

Audio is analyzed as part of the video; the app does not generate a standalone transcript.

If spoken content is unclear, the model may mark it as partially visible rather than guessing.

Audio analysis notes

  • Narration steers the summary
  • Spoken intent and errors can be quoted
  • No diarization or timestamped transcript

Frame rate hint

FPS is a sampling hint sent with the analysis request. For large files the provider may ignore it.

Long recordings can be chunked to keep analysis responsive.

Structured Extraction

The analysis output is freeform and adapts to your prompt. Typical outputs include:

Extracted Elements

  • Bug reproduction steps and observed errors
  • UI walkthrough notes and navigation paths
  • Design feedback or UX issues shown on screen
  • Suggested fixes or follow-up tasks

Analysis Artifacts

Video analysis produces artifacts stored with the job:

  • analysis_summary: Text summary stored in background_jobs.response
  • job_metadata: durationMs, framerate, videoPath
  • chunk_info: chunk boundaries for long recordings (when applicable)

Key Source Files

  • desktop/src/app/components/generate-prompt/_components/video-recording-dialog.tsx
  • desktop/src/contexts/screen-recording/Provider.tsx
  • desktop/src-tauri/src/jobs/processors/video_analysis_processor.rs
  • server/src/handlers/proxy/specialized/video_analysis.rs
  • server/src/utils/multipart_utils.rs
  • server/src/clients/google_client.rs

Planning Handoff

Video analysis summaries can be incorporated into the task description for planning.

The summary can be refined with text_improvement and task_refinement before file discovery.

Continue to video analysis

Learn more about how video frames are analyzed.