
Windows includes free built-in speech to text — and most people who would benefit from it don't know it's there. The confusion usually comes from not knowing which of the three available tools to use: Voice Typing, Windows Speech Recognition, or Voice Access each do different things, and Microsoft has never explained the differences clearly in one place.
This guide does that. It covers how to start dictating in under a minute, what each tool handles, how to improve accuracy, and where third-party apps make a meaningful difference.
Windows offers three distinct speech input tools, each with a different purpose:
Voice Typing (Win+H): The primary dictation tool introduced in Windows 10 and significantly improved in Windows 11. Converts speech to text in real time in any app that accepts keyboard input — Word, Notepad, Outlook, browsers, chat apps. This is what most users need for day-to-day dictation.
Windows Speech Recognition (WSR): The older system, available since Windows Vista. WSR handles both dictation and full voice control of Windows — you can open programs, click buttons, navigate menus, and fill out forms entirely by voice. More powerful than Voice Typing for accessibility use cases, but requires voice training for best accuracy.
Voice Access: The newest addition, available in Windows 11 22H2 and later. A modernized take on full voice control, designed with cleaner command syntax and better integration with Windows 11 UI elements. Gradually replacing the role WSR used to fill.
For typing text into documents and apps, Voice Typing is the right starting point — it's the simplest to use and doesn't require setup or training.
Windows 11 (recommended path):
Click Start > Settings > Accessibility > Speech
Toggle Windows Speech Recognition on if you want the full control option
For Voice Typing, no pre-setup is needed — just press Win+H
Windows 10:
Click Start > Settings > Time & Language > Speech
Follow the setup prompts to confirm your microphone is recognized
Press Win+H in any app to launch Voice Typing
Microphone setup (important): Voice Typing accuracy is directly tied to microphone quality and positioning. Before your first session, check that your correct microphone is set as the default in Settings > System > Sound > Input. A headset or dedicated USB microphone outperforms laptop built-in mics for dictation.
Win+H is the fastest path to speech to text on Windows — no menus, no searching, just the keyboard shortcut in any text field.
What happens when you press Win+H:
A small floating toolbar appears near the top of the screen
Click the microphone icon (or say "Start listening")
Speak normally — text appears in whatever text field has focus
Say "Stop listening" or click the mic again to pause
Useful voice commands while dictating:
"Delete that" — removes the last phrase spoken
"Undo that" — reverses the last action
"New line" — inserts a line break
"Period" / "Comma" / "Question mark" — inserts punctuation
"Stop listening" — pauses dictation
Use Case | Best Tool |
|---|---|
Typing text into Word, email, Notepad | Voice Typing (Win+H) |
Dictating while keeping hands free from keyboard | Voice Typing (Win+H) |
Controlling Windows by voice (opening apps, clicking) | Voice Access (Win 11) |
Full hands-free PC operation including legacy apps | Windows Speech Recognition |
Accessibility — operating computer without hands | Voice Access or WSR |
Voice Typing is faster to start, requires no training, and is sufficient for the majority of dictation use cases. Its main limitation: it doesn't control Windows itself — you can't open programs, click buttons, or navigate menus with it.
Windows Speech Recognition is more powerful but slower to learn. The command vocabulary is more complex, and it benefits from the built-in voice training session (Control Panel > Ease of Access > Speech Recognition > Train your computer to better understand you).
Voice Access is the modern alternative to WSR for Windows 11 users, with a more intuitive command structure. Enable it via Settings > Accessibility > Voice Access.
Voice Typing accuracy out of the box is solid for standard English in a quiet environment. These steps push it further:
Microphone positioning: Hold the mic or position the headset so it's 2–4 inches from your mouth. Laptop microphones placed on a desk introduce significant room noise and distance-related accuracy drops.
Quiet environment: Background noise — TV, open windows, air conditioning — reduces accuracy measurably. For regular dictation, a relatively quiet space produces meaningfully better results than a noisy one.
Speech Recognition training (WSR): If you're using Windows Speech Recognition, run the voice training wizard (Control Panel > Ease of Access > Speech Recognition > Improve voice recognition). Reading the training prompts for 15–20 minutes calibrates the engine to your voice profile.
Technical vocabulary: Neither Voice Typing nor WSR supports custom vocabulary additions. For specialized terms (legal, medical, technical), say the words slowly and clearly on first use — the engine often learns from corrections. Dragon Professional remains the only Windows option with a true custom vocabulary feature.
Punctuation by voice: Speak punctuation explicitly ("period", "comma") if auto-punctuation misses. Windows 11's Voice Typing has improved auto-punctuation, but explicit commands are more reliable for formal documents.
Tool | Accuracy | Custom Vocabulary | Cost | Best For |
|---|---|---|---|---|
Windows Voice Typing | Good | No | Free | Everyday dictation |
Windows Speech Recognition | Good | Basic | Free | Full PC voice control |
Dragon Professional | Excellent | Yes | ~$500 one-time | Legal, medical, power users |
OpenAI Whisper (via app) | Excellent | No | Free (open-source) | Offline, privacy-sensitive use |
Google Docs Voice Typing | Good | No | Free | Typing in Google Docs |
Good | No | Freemium | Meeting transcription |
When the built-in tool is enough: For everyday notes, emails, messages, and document drafting in English, Windows Voice Typing handles the workload. The accuracy gap between it and paid tools has narrowed in Windows 11.
When to upgrade: Dragon Professional is the only meaningful upgrade for users who dictate 3+ hours daily, need custom vocabulary for specialized terminology, or require near-perfect accuracy for formal documents. The price is steep but the accuracy on technical content is noticeably better.
Whisper as a free alternative: OpenAI's Whisper model is available as open-source software and through several Windows desktop apps. It runs locally (no cloud upload), handles accented English better than most cloud-based engines, and is free. The trade-off: no live dictation — it transcribes audio files after recording, not in real time.
Windows speech to text doesn't require any downloads or subscriptions — press Win+H, speak, and it works. For most users, Voice Typing covers everyday dictation needs across all major apps. If you need hands-free control of Windows itself, Voice Access (Windows 11) or Windows Speech Recognition provides that.
Third-party tools only make sense at the margins: Dragon for professional high-volume users, Whisper for privacy-conscious users who want local processing, and Otter.ai for meeting transcription that needs speaker identification and summaries.
If you're also looking for a text-to-speech tool for the reverse workflow — having Windows read text back to you — AI Listen handles that on iOS and complements a Windows dictation workflow when you're switching between devices.






