Who owns the key patents for voice recognition - AT&T, Nuance, IBM? When will th...

syllogism · on Oct 9, 2014

Actually I think most of the work has been done in academia --- certainly that's where the recent deep learning stuff has come from. So, I don't think the important stuff is patented. (In general it's very hard to lock down ML improvements under patent. Once we can do it one way, and understand a little bit about what's working, we can usually replicate that performance with another technique.)

The big problem for open source speech recognition is training data.

melling · on Oct 9, 2014

First, these are two different problems to solve. Voice recognition and deep learning are different fields.

Is training really the issue for voice recognition? It has been a problem that has almost been solved for over a decade. Last year I saw this impressive use of Dragon Naturally Speaking for the PC, running in a VM on a Mac, that pretty much worked to code by voice.

https://www.youtube.com/watch?v=8SkdfdXWYaI

The developer mentioned that he didn't have any luck with Sphinx.

Xah Lee summarized the talk here: http://ergoemacs.org/emacs/using_voice_to_code.html

syllogism · on Oct 9, 2014

...Of course deep learning is not voice recognition. But the most recent advances in speech recognition have been from deep learning models, which have come from academia.

The system in the video you link is single speaker, closed vocabulary. You need massive training data for multi-speaker, open vocabulary.

IshKebab · on Oct 9, 2014

I've tried to use sphinx, but the problem was lack of training data (you have to supply it yourself pretty much!). It did have some data that was supposed to recognise numbers, but it didn't work (I mean, it ran, but the recognition was awful even when it only had to pick between 10 options).

Training is a huge issue for voice recognition. It's the only way Google and Apple have managed to take voice recognition from "works 80% of the time, but that is still bad enough to be totally usable" to "this actually works!". Maybe you don't remember how bad voice recognition was 10 years ago.

To give you an idea how important it is, on OSX you have the option to download data to improve offline voice recognition. It's something like 500 MB. And that's the result of the training.

mikeash · on Oct 10, 2014

I think there may be some confusion as to what "training" means. When it comes to voice recognition, it makes me (and I suspect others) think of the older software which required a user to read a bunch of text to it to train the software to your specific voice before it could do any kind of decent job understanding you. Now, everything is speaker-agnostic and works out of the box for anybody. Different kind of training.

walterbell · on Oct 9, 2014

Thanks for that pointer to the Python library which integrates with Dragon/Nuance to enable arbitrary commands, https://pypi.python.org/pypi/dragonfly/

bane · on Oct 9, 2014

https://www.youtube.com/watch?v=KyLqUf4cdwc

melling · on Oct 9, 2014

This video is not going to convince anyone to use voice recognition.

walterbell · on Oct 9, 2014

> The big problem for open source speech recognition is training data.

What exactly is needed for training - audio recordings with transcripts, human validation of recognized text?

There are successful crowdsourced efforts for proofreading of OCR'ed text. Archive.org could host a CC-licensed archive of sound & transcripts.

Recognition of the human voice is almost like writing, hopefully everyone could have access.

Edit: how much disk space would be needed - TB or PB?

albertzeyer · on Oct 9, 2014

For example, the Switchboard corpus (300h, 8khz, transcribed audio) is about 16GB.

That is a common size for LVCSR, and you need something around that area to get good performance (maybe minimum 100h). In academic papers by Google, they usually use their own private training data set, with e.g. 1900h. (E.g.: http://arxiv.org/pdf/1402.1128.pdf)

Some crowdsourced effort to collect transcribed audio under a CC-licence would be great!

bainsfather · on Oct 9, 2014

Maybe this? http://www.voxforge.org/home - "VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac)." (caveat: I have not recorded on this from (any) of my machines - I don't have the right plugin apparently)

Maybe also: https://librivox.org - has audiobooks read by volunteers, plus the book text.

kansface · on Oct 9, 2014

The more data the better although the relationship isn't linear.

albertzeyer · on Oct 9, 2014

One state-of-the-art framework is Kaldi, which is Open Source: http://kaldi.sourceforge.net/

You can even download trained models: http://kaldi-asr.org/

It supports many state-of-the-art methods, like DNNs, sequence training, etc. So you can get quite good results with it. To train it yourself, of course you need some good training data from somewhere.

nitin_flanker · on Oct 9, 2014

A patent has a life of 19 years and after that companies use to revive a patent to block the competition. I dont think IBM or AT&T or any company will lose hold on any of their patents

walterbell · on Oct 9, 2014

Isn't there a 5-year limit on the extension?

nitin_flanker · on Oct 9, 2014

A maximum of 5 years can be restored to the patent.

In all cases, the total patent life for the product with the patent extension cannot exceed 14 years from the product’s approval date, or in other words, 14 years of potential marketing time. I

f the patent life of the product after approval has 14 or more years, the product would not be eligible for patent extension.

all regulatory periods are divided into a testing phase and an agency approval phase. The regulatory review period that occurs after the patent to be extended was issued is eligible to be counted towards the following calculation:

First, each phase of the regulatory review period is reduced by any time that the applicant did act not act with due diligence during that phase. The reduction in time would only occur after an FDA finding that the company did not act with due diligence.

Second, after any such reduction, one-half of the time remaining in the testing phase would be added to the time remaining in the approval phase to comprise the total period eligible for extension.

Third, all of the eligible period can be counted unless to do so would result in a total remaining patent term from the date of approval of a marketing application of more than fourteen years. An additional limitation on the period of extension is that the extension cannot exceed five years. For example, if an approved drug product which is eligible for the maximum of five years of extension had ten years of original patent term left at the end of its regulatory review period, then only four of the five years could be counted towards extension. The Patent Trademark Office is responsible for determining the period of extension.

you can read more here - http://www.fda.gov/Drugs/DevelopmentApprovalProcess/SmallBus...

pbhjpbhj · on Oct 9, 2014

This just remind me that patent terms are too long ... but perhaps save that for all the other threads complaining about the IP systems across the world.

Move along, nothing to see here.