At Snarkmarket, Matt Thompson often wrote about “the Speakularity,” the moment when speech-to-text transcription would become sufficiently perfect that it became benign tech people with sufficient resources took for granted, like a toaster, refrigerator, or indoor plumbing. How close are we to the Speakularity today? Or is it still in the distance?




Comments 2
We're not there yet, but I think we're a great deal closer. My formulation for the Speakularity was that speech-to-text transcription would soon become fast, free, and decent. I think where we are is that it's now plenty fast and decent, but it's still actually pretty expensive! Trint, one of the first of the new generation of transcription apps, costs nearly $600 per year for a starter account. Otter.ai has a decent free tier if you have a few short conversations to transcribe each month (and are using Zoom or Google Meet for those), but for anything more, you're paying $120 per year.
The tech is there, but it's still too expensive (and probably processor-intensive) to achieve ubiquity. And perhaps that's a durable limitation. The moment I wrote that essay (2011, can you believe it?) was something of a golden age for free or freemium software. But the computing and environmental costs of things we briefly treated as free have become much more visible.
I believe with OpenAI whisper and its variants, we are here. There are already some open source versions which run locally on a mac. It's essentially free, fast, and very accurate. I think it's just a few iteration cycles from being easily available for anyone.
If you feel like this comment goes against the grain of the community guidelines or is otherwise inappropriate, please let me know and I will take a look at it.
This thread is closed for new comments & replies. Thanks to everyone for participating!