Title is misleading, they're not bringing the speech engine to everyone, you still have to pay for Google cloud API requests. They're merely open sourcing the Android app that calls out to the API. See the GitHub link in the article: https://github.com/google/live-transcribe-speech-engine/blob...
That's disappointing, the one exciting feature on the Pixel 4 is its offline transcription feature. From the headline I figured that's what this was about.
Google published research papers on offline recognition, the feature is shipping on Android phones (i.e. one can inspect a working device), there is an active OSS community for TensorFlow and lots of public work on speech recognition. Many building blocks are public for motivated researchers.
Ugh. Was so happy to read this title, thought I could finally stop work on reverse engineering the app's offline tflite engine. I guess this is motivation to pick that project. Back up
I'm still waiting for meeting transcripts that understand who is speaking. I'm legitimately surprised with how far we've come with speech recognition, how this fairly common use-case is omitted.
I'm not even saying it needs to name the people in the meeting. Just understand, contextually, if it is from "person 1" or "person 2." Then as it records associate it with that name.
Maybe this can help? But Google's existing APIs might be able to do this.
While I appreciate the audio codec discussion and bandwidth-to-accuracy tradeoffs, how much of the speech recognition could be done on-device rather than shipping it off to the cloud? It's my understanding that it's a matter of installing pattern files for analyzing the audio without needing to fail over to the cloud; how many GB are we talking to be able to cover normal daily speech, assuming a minimum of jargon? For the hearing impaired, not having to hit the cloud at all seems like the best option (and you don't need to compress the audio at all or worry about cloud-trip bandwidth).
On Android: Language and Input > Google voice typing > Offline speech recognition, then ensure Wi-Fi and data are off and try shouting at textboxes (you might need to press a microphone button on your selected keyboard, unsure).
Can anyone attest to how accurate this transcription is for technical subjects? I've attempted to integrate transcription into my work life (pharma), but correcting errors related to tech jargon or acronyms/abbreviations always outweighed any benefit.
I can't speak to Google's, but Dragon Professional supports legal and medical jargon out of the box. Pricey, but for powerful offline speech recognition, that's understandable.
Dragon legal’s primary advantage is that it handles citation formatting. I don’t think I’ve ever come across a legal term it didn’t know (maybe dépeçage).
This is cool but a bit worrisome...remember the days when it was too expensive to log all audio transmissions on any platform/communication device, so you thought you had some level of privacy? Projects like prism might be able to do more than simply log metadata.
I've been looking for some way to transcribe my own talks---sometimes I find a turn of phrase or example during the talk that strikes me as useful while I'm giving it, but then forget it. Perhaps this can be coaxed into providing this service for me.
The ML transcription services work for giving you the gist of what's been said if the recording is of decent quality. I should probably consider doing the same thing. If it's a recording that I want to be "perfect" (e.g. a posted podcast transcription), I still use human transcription; cleaning up the machine transcription isn't worth my time. But if the transcription is mostly to jog your own memory it's probably fine and much cheaper.
just downloaded on my antiquated LG5. No cloud key required. But the results are hilarious at best. Nearly a total waste of time. I guess we will have to rely on good ole notepad (paper) for now.