Where can I use voice transcription in the app?

Voice transcription works in two places: (1) Task description panel - dictate implementation requirements, and (2) Terminal modal - dictate commands that are appended to your active shell session.

Does voice transcription work offline?

No, voice transcription requires an internet connection to send audio to OpenAI Whisper API. The transcription happens in real-time with minimal latency.

Voice transcription for developers

Rapid specification capture with voice

Speak your requirements and ideas naturally. This is the first step in your specification workflow: capture ideas quickly with voice, then refine them manually with AI-powered prompts. The fastest way to capture initial specifications before refinement.

Why Voice Accelerates Specification Capture

Capture ideas before they fade

Stakeholders think faster than they type. Requirements and context get lost while fingers catch up. Voice lets you capture the complete specification before critical details fade.

Hard to describe while hands are busy

Reviewing code? Debugging? Drawing architecture diagrams? Your hands are occupied but you need to log the task. Voice transcription keeps you in flow.

Context switching kills momentum

Stop what you are doing to open a note app, type, then return. Every switch breaks concentration. Voice stays in the same workspace.

Key Capabilities

Multiple Language Support

OpenAI transcription supports multiple languages.

Per-Project Configuration

Set project defaults. Your team shares sensible defaults.

Terminal Dictation

Dictate commands directly to your terminal session.

Accuracy Benchmarks

What is Word Error Rate (WER)?

WER = (Substitutions + Deletions + Insertions) / Reference words. Lower is better.

Substitution: a word is transcribed incorrectly
Deletion: a word is omitted
Insertion: an extra word is added

In technical workflows, small WER differences can flip flags, units, or constraints—creating ambiguous tickets and rework. High accuracy preserves intent and enables precise, implementation-ready specifications.

gpt-4o-transcribe shows the lowest WER in this benchmark. Even a 1–2% absolute WER reduction can remove multiple mistakes per paragraph.

About these models

OpenAI gpt-4o-transcribe — advanced multilingual speech model optimized for accuracy and latency.
Google Speech-to-Text v2 — cloud speech recognition by Google.
AWS Transcribe — managed speech recognition by Amazon Web Services.
Whisper large-v2 — open-source large-model baseline for comparison.

Bottom line: Fewer errors mean fewer ambiguous tickets and less rework. gpt-4o-transcribe helps teams capture precise, implementation-ready specifications on the first try.

Illustrative Example: Capturing Specifications

OpenAI gpt-4o-transcribe

Create a Postgres read-replica in us-east-1 with 2 vCPU, 8 GB RAM, and enable logical replication; set wal_level=logical and max_wal_senders=10.

accurate

Competitor Model

Create a Postgres replica in us-east with 2 CPUs, 8GB RAM, and enable replication; set wal level logical and max senders equals ten.

Errors — Substitutions: 9, Deletions: 0, Insertions: 8. Even a few errors can invert flags or units.

Impact: Mishearing "read-replica" as "replica", dropping region suffix "-1", or changing "wal_level=logical" can lead to incorrect deployments or data flows.

Real Use Cases

Capture ideas hands-free

Scenario:

You are deep in a debugging session. You spot three related issues that need fixing. Speak them into the voice recorder without leaving your terminal.

Outcome:

Ideas logged instantly. Return to debugging without breaking flow.

Dictate while reviewing code

Scenario:

Code review reveals a refactoring opportunity. Your hands are on the diff, eyes on the screen. Voice captures the task description.

Outcome:

Task created with full context, zero typing, no context switch.

Faster task entry for repetitive work

Scenario:

You have 10 similar bugs to log after QA testing. Typing each one takes 2 minutes. Voice transcription takes 20 seconds.

Outcome:

10x faster task entry. QA feedback processed in minutes instead of hours.

Terminal commands without memorizing syntax

Scenario:

Need a complex git command with flags you always forget. Dictate it naturally, let transcription handle the syntax.

Outcome:

Commands entered correctly, faster than looking up documentation.

Frequently Asked Questions

Everything you need to know about PlanToCode

OpenAI transcription supports multiple languages. You can set a default language per project.

We use OpenAI transcription for accurate results.

Yes. You can configure language and model settings for each project. Settings are stored in the project configuration and shared across team members.

Refine Your Captured Specifications

Voice transcription is the first step in our Specification Capture workflow. Once you've captured your requirements, use AI-powered prompts to transform rough transcripts into clear, implementation-ready specifications.

Text Enhancement

Polish grammar, improve clarity, and enhance readability while preserving your original intent.

Task Refinement

Expand descriptions with implied requirements, edge cases, and technical considerations.

Learn about Specification Capture Mode

Start Capturing Specifications with Voice

From voice to refined specifications, seamlessly. Capture requirements hands-free, then refine with AI prompts. This is how corporate teams should capture and clarify requirements.

See terminal integration•Explore text enhancement