How to convert audio to text?

Hi All

I co-host a podcast and would like to convert our interviews (audio) into text?

Would you recommend any app or service?


I use Adobe Premiere for that. I know, it’s a video editor, but it has excellent audio transcription. Just add a audio track, no video track and it creates “subtitles” which is a transcription you can copy out of it.

1 Like

Drafts got this feature a few months ago. Works very well and is available in the non-paid free version.


I didn’t know that, I’m going to check this out.

It depends on the voices you’re working with and how you’re going to use/present the transcription. I’ve used the most because it did the best for my project at identifying the speakers. That was after testing 5-6 tools/providers. The Premiere Pro option Rob is intriguing, and it looks like Adobe has recently updated the model to run entirely offline. Drafts, or anything else based on Apple dictation, would IMO be good enough if you just need a rough text for search keywords.

Friend of the show Allison Sheridan wrote about and so I gave it a go. I was not impressed for what I needed but she had very good success and I think your use case is a lot closer than mine was.

FWIW, my use case was a recorded conversation which was 100% natural, not carefully on mic, and we stepped on each other quite a lot. That latter point was where it seemed to lose the plot the most. It probably also didn’t help there were a lot of people and place names mentioned.

Hah, nice. Otter is the Rev partner I used for one project, archiving and transcribing about 1200 hours of a three-speaker podcast over several months. It is definitely inexpensive and adds some quality-of-life on top of Rev. I would only caution that it’s not a good interface for bulk episode/file management.

This was news to me as well. So I searched the documentation…

Very Cool. A couple things I noted to be warry of:

It is best suited to transcribing recordings of a single person speaking…

So not so good for transcribing a podcast.

The transcription process works by extracting audio content from the media, breaking it into segments suitable for processing by Apple’s speech recognition APIs, and transcribing each of those segments. Due to time limits imposed by speech recognition, content longer than one minute is broken up and a separator (=== ) inserted between segments in the transcription.

That last one would be a deal breaker for me for any regular use. However, for the occasional need, its nice to have available in a tool I already have installed and use.

1 Like

Indeed I had tried Drafts before and doesn’t
work for this purpose. I’ll take a look at the other options.

The audio is quite clear and though it has 2 hosts and 1 interviewee there isn’t much of overlapping. The caveat is that it is also not in English, which makes some services like Rev a bit more expensive. But will still check it.

Thanks everyone!!!

I use it for this purpose for a podcast with 3-4 speakers but it does require that I do a listen through and add names and clean-up mistakes. It’s a paid gig so I want the result to be as close to perfect as possible. The initial chunk of work is done by Drafts and then I go through and break up the dialog into lines, add names, punctuation, etc. Still takes time but faster than typing it all from nothing.

Otter is ruining changing their pricing plans. If you want to use the 6,000 minutes/month plan at $13/mo, you’ll want to lock it in with an annual plan before September 27th. After that, you’ll have to pay $16/month for only 1200 minutes of subscription.

I hope we get more competitive options. I don’t have regular use for transcription and the new limits are way too restrictive. I was happy with the free plan but they have been gradually making it less attractive even for casual users.

1 Like

I recently used the free plan of Descript to create a transcript to play concurrently alongside a remote presentation - it was very easy to use & I’d recommend giving it a try.

1 Like