Hello, All of the Opus quality studies that I've seen focused on human-perceived quality. I'm interested to know of any experience with machined "perceived" quality, particularly related to speech recognition or biometrics. I'm also interested in folks thoughts on optimizing Opus for ASR. For example, removing certain classes of comfort noise, filtering non-speech bands, tuned VAD, etc. One could imagine eventually rolling these updates back into the standard under an "ASR" mode. A big part of optimizing for ASR will be an infrastructure that reports feedback on candidate improvements and facilitates regression testing. To that end, Nuance is willing to publish a service which allows developers to upload codec binaries to our computational grid and report back a score. If such a service is of interest to you, please let me know of any design constraints you have in mind. In particular, I'd like to know preferences in accuracy vs. latency in the service. For those of you familiar with speech recognition, you will be aware that testing involves tens and hundreds of thousands of utterances, hence my concern. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20120914/b0b8c585/attachment.htm
Hi Milan, On 12-09-14 04:09 PM, Young, Milan wrote:> A big part of optimizing for ASR will be an infrastructure that reports > feedback on candidate improvements and facilitates regression testing. > To that end, Nuance is willing to publish a service which allows > developers to upload codec binaries to our computational grid and report > back a score.Did you have any thoughts yet how you were going to give access to that? I assume Nuance doesn't want to run binaries from random people on the Internet :-)> If such a service is of interest to you, please let me > know of any design constraints you have in mind.Well, we're definitely interested in adding that to our regression suite.> In particular, I?d > like to know preferences in accuracy vs. latency in the service. For > those of you familiar with speech recognition, you will be aware that > testing involves tens and hundreds of thousands of utterances, hence my > concern.I suspect we'll want the "quick" test for automated regression testing and a longer test for any experiments specifically designed to optimize ASR accuracy. What kind of times are we talking about here (just the order of magnitude would help)? Cheers, Jean-Marc
On Fri, Sep 14, 2012 at 1:09 PM, Young, Milan <Milan.Young at nuance.com> wrote:> I?m interested to know of any experience with machined ?perceived? quality, > particularly related to speech recognition or biometrics.The closest thing is the PESQ (and PEAQ) score tests, which are computational estimates of human-perceived quality.> I?m also interested in folks thoughts on optimizing Opus for ASR. For > example, removing certain classes of comfort noise, filtering non-speech > bands, tuned VAD, etc.Those all sound like great ideas to me. (I would add VBR strategy to the list.) The converse is also true, of course: you might well want to retrain your ASR for Opus! Remember that Opus spans two orders of magnitude in bitrate, mono vs. stereo, and at least two totally different encoding algorithms. When you don't control the encoder, you'll have to deal with the whole variety. When you do, you'll have to decide which modes are worth using, and which are not. You might even want to maintain bitrate- and mode-specific ASR models!> One could imagine eventually rolling these updates > back into the standard under an ?ASR? mode.This seems very unlikely to me. Opus is a decoder-specified standard, so the encoder can be modified arbitrarily without requiring re-standardization. It's hard to imagine anything worth doing that would cause you to go outside the current standard. --Ben