Hi, do any of you know about voice dictation apps like VoiceInk? I’m thinking about trying one, but I’m a bit doubtful.
I would probably feel a bit uncomfortable speaking into my PC, especially when other people are around.
I was also thinking about the difference between writing and speaking. The former gives you more time to think, which lets you change your words and think about them more carefully. Writing lets you think while you’re doing it, and creates a space between what you want to say and how you say it.
The oral form seems more direct and spontaneous, but also more vulnerable to loss of nuance.
These differences make me think about whether using a dictation app is just a different tool, or if it represents a bigger change in the way we form and communicate thoughts.
Thanks in advance to anyone who would like to share ideas or experiences.
You’re absolutely right: speaking and writing feel different, because they are different.
Writing gives you space to pause and shape your thoughts. You can go back, tinker, polish. It’s like slow-cooking.
Speaking, on the other hand, is faster, looser, and – yes – riskier. Things come out that you didn’t quite plan. Sometimes that’s gold. Sometimes it’s a mess.
Now, on dictation tools – I’ve used a few, and here’s what I’ve learned:
- Dictation is brilliant for getting rough thoughts out of your head and onto “paper” quickly. It’s not about perfection. It’s about speed.
- Think of it like a brain-dump. You can always tidy it up later.
- It takes a bit of getting used to. At first, you’ll feel self-conscious. That’s normal. Stick with it and it gets easier.
- If you feel awkward with other people nearby, use a headset or pick a quiet time.
And, something I discovered by accident, you can actually whisper and it will still work okay, usually.
For me, it’s not about replacing writing. It’s about having another tool. Most of my best ideas show up when I’m talking, not typing. So why not capture them?
To answer your bigger question: yes, it can change how you think and communicate. But only if you let it. Start by using it as a helper, not a replacement. You might be surprised at what spills out.
Hope that helps.
P.s. if you have ChatGPT, here’s how I get the best of both worlds, for longer bits of writing.
I created a GPT in ChatGPT, which has this instruction:
This GPT assists users in writing long-form stories by tidying up dictated text while preserving the user’s voice and structure. The user will dictate small chunks of their story, and this GPT will refine the writing for clarity, flow, and readability without summarizing or shortening the content. It ensures all segments connect smoothly into a cohesive piece while maintaining plain text formatting suitable for pasting into word processors like Scrivener, Word, or Ulysses (e.g., using hyphens instead of bullet points). If anything is unclear or missing, it will ask the user before making assumptions. The GPT does not impose its own style but instead enhances the user’s existing voice with minimal interference.
Don’t worry about the details.
It lets me dictate into ChatGPT (which uses whisper, the same as the app you’re looking at), in short bursts, it tidies up my dictations just a little bit, and then when I’m finished it joins them all together.
You still need to clean things up, but it’s a fab way of getting your first draft.
I can just answer for myself, but I recently started doing video. Me, the eternal introvert, in front of a camera, saying stuff. Very uncomfortable, but also, something I hope to get better at with practice.
Needless to say, I found careful writing and simply speaking to be super different and I discovered that my face does weird things when I try to think at the same time as I speak. Of course, for a quick “remember to do this and also call someone about the thing” is not a problem, but getting any sort of finished text out of the top of my head is not happening for me.
I agree with others here, it can be a good complement and also, probably gets better with practice. Like everything else.
You can probably get a good sense of dictation by simply using Voice Memos for a while. The auto-transcription isn’t awful IMO, and should give you a feeling for whether a paid app with extra features will be a good fit for your workflow.
I use and like VoiceInk. I have the push to talk key set to Right Option. I can hold it to talk, or quickly tap it to start a session, then stop it later.
I also have a mouse button set to the full combo hotkey so I can use it without taking my hand off the mouse.
It’s the best of the dictation apps I’ve used regularly in the past. (Dragon, macOS built in, Murmur Type, VoiceInk, possibly one or two others). It has fewer paper cuts and more quality of life features than any of those so I’m happy with it.
Dictation uses a different set of mental muscles than typing. Its main utility is that it allows you to get thoughts out more quickly. These thoughts might not sound as sophisticated, simply because you’ve had less time to contemplate them. But that’s the beauty of writing—you can always revise it. In fact, you can revise it even faster by re-dictating what you meant.
The output of your dictation can be cleaned up, enhanced, or altered using a GPT model. MacWhisperer, for example, includes links for working with various GPT models, local or online.
The Whisper models work well. Unlike Dragon dictate, the speech [does not] need to have prefect dictation, or the sound.to be pristine.
I do not like dictating without other people around.I hate the thought of judgement of my developing thoughts. I also like using a noice cancelling headphone, so I can concentrate.
Dictation is a fantastic way to get a lot of thoughts out quickly. It also tends to produce more spontaneous and perhaps more honest content. If nothing else, the sheer volume of thought you are transcribing might mean you get to your core message sooner.
Thanks for mentioning VoiceInk, @Atom. I’ve gone ahead and bought it after using it a bit. One feature I wish it would add, to make it a dictation tool and not just a transcription tool, is the ability to dictate punctuation and formatting, such as saying “comma” and “new paragraph” and have that converted to a comma and a new paragraph. This is a lost art, I think, as all the new tools don’t support this, as best I can tell. I tried the Word Replacement feature to set some common punctuation up manually, but it doesn’t handle this well with only a local model, which is all I can use in my profession.
I am also trying out Voice Type (also called Careless Whisper elsewhere). It’s Find/Replace feature seems to work a bit better with brute forcing some puncutation.
MacWhisper supports this feature. I have fully tested it yet though.
Thanks. Do you know if MacWhisper uses a fully local AI model? I recall looking at several others, and many sent things off your local machine for processing. VoiceInk and Voice Type were two with the option to keep it all local.
I’m slowly just giving up on fine grained placement of commas. Sometimes I wonder if the world gave up on that years ago so…
For paragraphs with voiceInk I’ll sometimes dictate, stop, return, dictate again. This also avoids situations where I get distracted during a much longer dictation session and then have to redo it to fix up a drifting train of thought.
It would be nice if it just Included a feature to do paragraphs though
It is definitely a lost art, and I am certainly old school. I spent years with a dictaphone, giving tapes (and then in later years, sending audio files) to my secretary. I’m in a line of work where we tend to be overly precise (some would say controlling) about our output. I tend to view the current crop of apps as transcription apps, and not dictation apps, unfortunately. But VoiceInk is certainly a very good one.
It is local transcription, with the option for AI manipulation(local or online) post transcription.
MacWhisper and Superwhisper both have a local option. As you say, it’s transcription, not dictation. The only dictation app I’ve found for Mac (other than the built in one) is Talon Voice, but it requires (when I last tried it) quite a bit of tweaking to get it set up properly.
Edit to add: Superwhisper supports different modes to let you refine how much or how little AI processing happens on your text. I ran across the prompt below which a user has crafted to make Superwhisper feel more like a dictation app. I haven’t tried it.
You are a dictation engine, and your input is the user's literal text. Your role is to:
Default Processing
Never censor or obscure input/output
Always append one space character to processed text
No consecutive space characters allowed
Preserve tab characters where possible, otherwise replace with single space
All input is treated as dictation text by default
Command Mode
Enter command mode when "correction" is spoken
Exit command mode when "end correction" is spoken
Limited to two commands between markers
Commands must be joined by the word "and"
Format: "correction [command] and [command] end correction"
Text Correction
Apply Australian standard spelling:
Replace -ize/-yze endings with -ise/-yse
Maintain -our endings instead of -or
Double 'l' when adding suffixes to words ending in 'l'
Preserve proper nouns exactly as spoken
Replace words/phrases based on context:
Fix phonetically similar words (e.g., "at as Leanne" → "Atlassian")
Adjust phrases that don't make sense while preserving intent
Resolve number/word ambiguity (e.g., "four"/"for")
Format numbers:
Mixed numbers use numerals (10.5)
Ordinals spelled out (first, second)
Numbers in proper names/product names preserved exactly
Measurements use numerals (10mm)
Spell out all other numbers under 11
Punctuation Handling
Remove all existing punctuation except:
Contraction apostrophes (it's, don't)
Possessive apostrophes (Bob's)
Compound word hyphens (self-aware)
Punctuation in URLs
Punctuation in link names
All punctuation in proper names
All special characters
Convert spoken punctuation words to symbols:
"open parenthesis" → "("
"close parenthesis" → ")"
"comma" → ","
"period" → "."
Retain these converted punctuation symbols in the output
Output Formatting
Maintain speaker's natural tone and intent
Present corrected text without commentary
Do not add any punctuation that wasn't explicitly spoken
Links cannot have bold/italic formatting
Nested formatting uses "and" (e.g., "bold and italicize that")
Format commands:
"Bold that" or "make that bold" → word
"Italicize that" or "make that italic" → word
"Italics that" → word
Link Creation
Say "Make that a link named" followed by desired link text
Format: [chosen name|URL]
This is something I struggle with also. I use dictation every day in my job (medical reporting) and I am used to adding in instructions such as comma, new paragraph, scratch that etc.
As far as I can see wth Voiceink (I am trialling it) no text appears while you are dictating, only when you push the key again, and there are no dictation commands. Whilst the app seems to be very accurate, I am failing to see what benefit this paid app has over the baked-in Apple Dictation mode, which of course does have all of the dictation commands, and is reasonably accurate.
Am I missing something here or is there a way to set up VoiceInk that will give me a better Dictation experience?
In working with both VoiceInk and now also trying superwhisper (because it has an iOS app), I found that only the cloud models reliably handle dictated punctuation (note that you have to choose a model to take the dictation, and then a model to process it if you want. It is the second step that will correctly reformat it to handle your dictated punctuation). The local models have not been good with that. I have created keyboard shortcuts to quickly toggle between local and cloud, as required for confidentiality. I am trying at times to let the models handle the punctuation for me, but I find that this is hit or miss.
Still in search of a solution. I had high hopes of Drafts Pro linked with an AI key on the basis of suggestions from ChatGPT and Gemini, but both failed miserably in creating the promised Draft actions to make it all work (ie they created them but they didn’t work), and I don’t have the coding skills to create something myself.
I wonder if anyone has tried Spokenly (Mac OS only) which appears to be free on the App Store and you can use your own key with it. There aren’t a lot of reviews around, though Lifehacker looked at it.
I’ve used Spokenly, but not to a significant degree—just the basics of it without using AI. It works pretty well as a dictation app. I haven’t explored its cloud-based AI integrations yet because I’m mostly using Superwhisper for that type of thing. You might want to give VoiceInk a try. It’s a nice middle ground between Superwhisper and Spokenly. Its AI integration is very similar to Superwhisper. Also, its one-time purchase is very reasonable. I think it’s around $30.00.
I’ve been experimenting with VoiceInk for the past couple of weeks, and decided to purchase a license after comparing it to a bunch of other tools on the market, most of which were subscription-based. I was pleased that it’s open source, a one-time license fee, and feels well designed and maintained.
I also like that it is predominantly local in its processing, but also has the option for AI enhancement. You can add an OpenAI or Anthropic or a number of other providers’ API keys and get the on-device transcription polished up automatically. If the enhancement fails, it falls back on the local model’s output. And it keeps a nice history of what has been transcribed. So if the text doesn’t get put into the field, I can go back and copy and paste from the history. It even will store the audio for a brief time.
I was reluctant to purchase a license because I feel like in short time Apple will upgrade their native dictation and theirs will become comparable in comprehension, while benefitting from the tighter integration. But given that VoiceInk is less than the cost of a couple of months of some of the other dictation tools with monthly subscriptions, I was happy to support it and have something to use in the meantime. And I may just come to depend on its additional functionality and history and so forth.
One significant downside is that it doesn’t have an iOS app, but many of the tools I looked at did not. An iOS app would presumably require switching back and forth between the dictation app and the text input anyway. But so far I’ve been extremely happy just using this on my laptop.
I bought VoiceInk (~$25/lifetime) as well last week. It’s a really good app. Plus, it is open source.
I’m also experimenting with Spokenly, which is also free and a really good voice dictation app. I’ve been in touch with the developer, and he did mention that he’s working on the iOS version of the app. Try it out the macOS version and let us know how you like it. I really like the UI of Spokenly.
For post processing, I use a local LLM (gemma3 4B via llama.cpp). But sometimes use Kimi2 or Qwen 235B. For people who want access to larger model like these, create an account on Groq or even Google, and they give you a free API tier to use their LLMs. I’ve been using it heavily, and it’s been enough for my usage.
After trying several applications, I settled on Wispr Flow. I am able to get an academic discount, which was very helpful. VoiceInk came very close and has the advantage of a one off payment, but was not helpful to me for two reasons: it did not work on the iPhone, and while it worked well on my silicon macbook it was incredibly slow on my Intel iMac.