thr3ads.net - opus - [opus] Opus for ASR [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Young, Milan

2012-Sep-14 20:09 UTC

[opus] Opus for ASR

Hello,

All of the Opus quality studies that I've seen focused on human-perceived
quality.  I'm interested to know of any experience with machined
"perceived" quality, particularly related to speech recognition or
biometrics.

I'm also interested in folks thoughts on optimizing Opus for ASR.  For
example, removing certain classes of comfort noise, filtering non-speech bands,
tuned VAD, etc.  One could imagine eventually rolling these updates back into
the standard under an "ASR" mode.

A big part of optimizing for ASR will be an infrastructure that reports feedback
on candidate improvements and facilitates regression testing.  To that end,
Nuance is willing to publish a service which allows developers to upload codec
binaries to our computational grid and report back a score.  If such a service
is of interest to you, please let me know of any design constraints you have in
mind.  In particular, I'd like to know preferences in accuracy vs. latency
in the service.  For those of you familiar with speech recognition, you will be
aware that testing involves tens and hundreds of thousands of utterances, hence
my concern.

Thank you

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/opus/attachments/20120914/b0b8c585/attachment.htm

Jean-Marc Valin

2012-Sep-14 21:13 UTC

head link

[opus] Opus for ASR

Hi Milan,

On 12-09-14 04:09 PM, Young, Milan wrote:> A big part of optimizing for ASR will be an infrastructure that reports
> feedback on candidate improvements and facilitates regression testing. 
> To that end, Nuance is willing to publish a service which allows
> developers to upload codec binaries to our computational grid and report
> back a score. 
Did you have any thoughts yet how you were going to give access to that?
I assume Nuance doesn't want to run binaries from random people on the
Internet :-)
> If such a service is of interest to you, please let me
> know of any design constraints you have in mind.
Well, we're definitely interested in adding that to our regression suite.
> In particular, I?d
> like to know preferences in accuracy vs. latency in the service.  For
> those of you familiar with speech recognition, you will be aware that
> testing involves tens and hundreds of thousands of utterances, hence my
> concern.
I suspect we'll want the "quick" test for automated regression
testing
and a longer test for any experiments specifically designed to optimize
ASR accuracy. What kind of times are we talking about here (just the
order of magnitude would help)?

Cheers,

	Jean-Marc

Benjamin Schwartz

2012-Sep-14 23:15 UTC

head link

[opus] Opus for ASR

On Fri, Sep 14, 2012 at 1:09 PM, Young, Milan <Milan.Young at nuance.com>
wrote:> I?m interested to know of any experience with machined ?perceived? quality,
> particularly related to speech recognition or biometrics.
The closest thing is the PESQ (and PEAQ) score tests, which are
computational estimates of human-perceived quality.
> I?m also interested in folks thoughts on optimizing Opus for ASR.  For
> example, removing certain classes of comfort noise, filtering non-speech
> bands, tuned VAD, etc.
Those all sound like great ideas to me.  (I would add VBR strategy to
the list.)  The converse is also true, of course: you might well want
to retrain your ASR for Opus!  Remember that Opus spans two orders of
magnitude in bitrate, mono vs. stereo, and at least two totally
different encoding algorithms.  When you don't control the encoder,
you'll have to deal with the whole variety.  When you do, you'll have
to decide which modes are worth using, and which are not.  You might
even want to maintain bitrate- and mode-specific ASR models!
>  One could imagine eventually rolling these updates
> back into the standard under an ?ASR? mode.
This seems very unlikely to me.  Opus is a decoder-specified standard,
so the encoder can be modified arbitrarily without requiring
re-standardization.  It's hard to imagine anything worth doing that
would cause you to go outside the current standard.

--Ben

Possibly Parallel Threads

Search for more reasonably related threads

opus - Sep 2012 - Opus for ASR

[opus] Opus for ASR

[opus] Opus for ASR

[opus] Opus for ASR

Possibly Parallel Threads