Otter.ai does a pretty good job of disambiguating speakers.
I used it on a couple of podcasts before the host began getting transcriptions.

I have a lot of transcribed audio in Otter. Unfortunately, they recently made their plans a lot worse for that purpose by cutting minutes/month from 6k to 1.2k. I paid for a year to buy time, but I need to figure something out.

Otter’s UI also becomes difficult to navigate with a lot of files in it. It prioritizes whitespace and it uses infinite scrolling rather than showing all files or more straightforwardly navigable/bookmarkable pagination.

Finally, if you finish your archiving and downgrade to free, you can only view your 25 most recent conversations and the rest are hidden. You can export everything before downgrading, but then you don’t have the audio-text sync and the ability to reflow it after correcting the text.

An open source, self-host able web app that provides some of these features and integrates with a local or remote speech-to-text engine would be sweet!

Someone I know also raves about how good Otter is, and I believe them. But from my experience it suits “panel-like” discussions, not regular conversations.

As noted in another thread, I’ve been recording conversations with my Mum and have been manually transcribing them because Otter could not handle over-talking and fractured sentences. Some of the transcript was greatl but when it missed, it made a total mess, including inventing a phantom third speaker.

And then I saw this tweet.

It’s not for the faint of heart, getting it installed and working, but I managed it. Like Federico, I am stunned at its output. It does not do speaker differentiation, but the accuracy of the words is incredible. There are mistakes, particularly in sections where it’s hard to even hear it myself, but if I wasn’t a stickler for accuracy, I’d release it as is and it would be very readable.

here's the ai @zkarj is referring to.

How to set it up on your computer:

This thread’s gone full circle. :slight_smile: That’s a great use case, though.

I noodled a bit with speech disambiguating for whisper to process last night, but didn’t get far. I’m sure the right off-the-shelf tool exists and I just haven’t found it yet.

There are two not so expensive App using Apple Speech to transcribe audio:

Just Press Record

They are working on-device!