Voice Transcription
Recording lifecycle, device management, and streaming behaviour for voice-driven prompts.
Voice transcription is available anywhere the desktop app exposes dictation controls, including the plan terminal and prompt editors. The feature records audio locally, sends chunks to the transcription service, and inserts recognised text into the active input field without blocking manual typing.
Recording workflow
The recording hook keeps a state machine with idle, recording, processing, and error states. It tracks duration, manages silence detection, and ensures recordings stop automatically after ten minutes. Chunks are buffered and forwarded to the transcription action, which returns recognised text for insertion.
Project-aware settings
When a recording session starts, the hook looks up the active project's transcription configuration. Language codes, preferred models, and other settings are retrieved before capturing audio so recordings follow the project's preferences.
Device management
The feature requests microphone permission, enumerates available audio inputs, and lets users switch devices during a session. Audio levels are monitored live so the UI can surface silence warnings if the microphone is muted or disconnected.
Multi-destination routing
Transcribed text can be routed to multiple destinations: (1) Task description editors with cursor insertion and immediate text_improvement refinement, (2) Terminal dictation buffer for command execution (e.g., 'run npm test' → typed into PTY), (3) Meeting notes mode with accumulated buffer auto-saved to SQLite and task_refinement generating actionable tasks. The insertTranscript callback enables flexible routing without coupling. Routing destination is stored in job metadata for audit trails.
Usage examples
Common voice transcription workflows: Sprint planning (meeting recording → transcription → task_refinement → task descriptions), Terminal commands (dictation → transcription → terminal input → execution), Bug reports (verbal description → task editor → text_improvement → task_refinement), Architecture discussions (video + audio → vision analysis + transcription → combined text_improvement).