Meeting Transcription vs Dictation: Why You Need Both
If you've looked for voice tools for macOS, you've probably noticed they split into two distinct camps: apps that transcribe your meetings, and apps that let you dictate into other software. Almost nobody does both. That's a problem, because professionals who need one usually need the other.
What Meeting Transcription Does
Meeting transcription is about capture at scale. You join a call — Zoom, Teams, Google Meet, or a plain phone call — and you want a record of everything said. The app needs to handle multiple speakers, separate "Me" from "Them", deal with crosstalk, and produce a legible transcript you can review later.
The key requirements here are:
- Dual-stream audio. Capturing your microphone and the system audio (other speakers) simultaneously.
- Speaker diarization. Labeling who said what.
- Long-form accuracy. Staying accurate over a 60-minute meeting, not just short bursts.
- Post-meeting processing. AI summaries, action items, export to Markdown or PDF.
Tools like Otter.ai, Fireflies, and Grain are built specifically for this use case. They're good at it — but that's all they do.
What System-Wide Dictation Does
Dictation is about speed at the cursor. You hold a hotkey, speak a sentence, and the text appears wherever your cursor is — a Slack message, a code comment, a terminal command, an email reply. It needs to be fast (sub-500ms), accurate on short utterances, and frictionless.
- Low latency. Any delay over 500ms breaks the flow of thought.
- System-wide access. Works in any app without requiring special integrations.
- Push-to-talk or toggle. Hotkey-based activation that doesn't interfere with normal typing.
- Voice commands. "New line", "delete that", basic editing without reaching for the keyboard.
macOS has built-in dictation, but it's slow and unreliable for professional use. Third-party tools like Whisper Flow and Wispr Flow address this — but again, dictation only.
Why They're Usually Separate
The two use cases have genuinely different technical architectures. Meeting transcription needs multi-channel audio, long buffers, and offline processing. Dictation needs a real-time pipeline with minimal buffering.
Running both simultaneously is harder still. The dictation microphone can interfere with meeting audio capture. The transcription engine (typically Whisper Large) is too slow for real-time dictation. Most developers pick one problem and solve it well — leaving professionals to juggle two apps, two subscriptions, and two mental models.
The Combined Approach
Echoic uses different models for different jobs. Meeting transcription uses Parakeet v3 or Whisper Large — high-accuracy models for long-form audio. Dictation uses Moonshine v2 — compact, optimized for real-time latency.
You can hold your dictation hotkey to type in Slack while a meeting is recording in the background. Echoic handles the audio routing so both streams stay clean.
When You Actually Need Both
- Morning standup → transcribed, action items extracted
- Async Slack reply → dictated directly into the compose box
- Customer call → transcribed, summary shared with the team
- Email to a client → drafted by dictation while reviewing notes
- Code comment → typed by voice without leaving the keyboard
- Claude Code / Codex prompt → described out loud instead of typed out
- Bug explanation to an AI assistant → faster to speak than to write
Keeping these in separate apps creates friction — different hotkeys, different windows, different mental models. One app that handles both modes reduces that friction significantly.
Dictation for AI-First Development
AI coding tools — Claude Code, Codex, Cursor, ChatGPT — are fundamentally prompt-driven. Your words are the code. The better you describe your intent, the better the output. And yet most developers type every prompt, even long ones.
Speaking is three to four times faster than typing. For a short prompt that's a minor difference. For a detailed description — "refactor this authentication module to use JWT, extract the token validation logic into a separate utility, and update the tests to reflect the new structure" — the gap is significant. More importantly, speaking lets you think out loud. You naturally add context, catch edge cases mid-sentence, and explain why rather than just what. Typed prompts tend to be terser and less precise.
Because Echoic works system-wide via the macOS Accessibility API, dictation lands wherever your cursor is — the Claude Code terminal, a Cursor chat panel, the ChatGPT browser tab, a GitHub PR comment. No integrations, no clipboard steps. Hold the hotkey, speak, release. The text appears.
Practical AI development workflows that work well with voice:
- Feature description: "Add a settings panel that lets the user configure the model, API key, and temperature. Use the existing GlassCard component for the container."
- Bug report: "The dictation hotkey stops responding after the screensaver activates. It starts working again after the user manually triggers a transcription."
- Refactor request: "Extract all the hardcoded color values into a theme file and replace inline styles with CSS variables."
- Code review: "Explain what this function does and flag any edge cases I might be missing."
The cognitive overhead of switching from thinking to typing breaks the flow. Voice removes that switch.
The Trade-Off
If you only attend one meeting a week and never dictate, a dedicated meeting tool may suit you better. If you only dictate, a focused dictation app is simpler.
But if you do both regularly — which describes most knowledge workers — the combined approach wins. That's the bar Echoic is trying to clear.