For the last couple months, Nuance has performed extensive testing on how the Opus codec performs in the speech recognition task. I'm hoping to publish a full report in the coming months, but until then all I have is a teaser. Opus performed within about 1% of the WER (Word Error Rate) of unencoded audio. This is compared to about 5% for Speex, which was the previous codec of choice. Well done to you all! As Nuance considers migrating to Opus, we'd like to consider the topic of transport. Traditionally we've relied on TCP for reasons of reliability. Opus, with its packet redundancy features, offers an attractive real-time alternative that we will soon be testing. But in order to apply an apples-apples comparison we need to model both data rates and latency in real world scenarios. For UDP, I'm assuming that the redundancy feature adds no additional latency. Correct? On the data rate question, I see that the Opusenc tool provides an "expec-loss" parameter with the value expressed as a percentage. Could someone please describe how this is implemented? Are you simply removing some percentage of packets from the result, or is there a more complex model underpinning the exercise? Modeling TCP data rates and latency in similarly losssy scenarios seems much more difficult since dropped packets have cascading effects. Has anyone on this list considered this class of comparison? Any suggestions for modeling software that could aid my search? Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20121128/27477d64/attachment.htm
Hello Milan Young, thanks for doing the ASR testing. We would be happy to consider your results to be in included in the Opus characterization draft. Please feel free to post them to the Opus IETF mailing list, too. With best regards, Christian Hoene -- Universit?t T?bingen, Sand 13, 72076 T?bingen, Germany Tel +49 7071 2970532, Fax +49 7071 5220 http://kn.inf.uni-tuebingen.de/staff/hoene.html Von: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] Im Auftrag von Young, Milan Gesendet: Mittwoch, 28. November 2012 12:51 An: opus at xiph.org Betreff: [opus] Opus for ASR - update and questions For the last couple months, Nuance has performed extensive testing on how the Opus codec performs in the speech recognition task. I?m hoping to publish a full report in the coming months, but until then all I have is a teaser. Opus performed within about 1% of the WER (Word Error Rate) of unencoded audio. This is compared to about 5% for Speex, which was the previous codec of choice. Well done to you all! As Nuance considers migrating to Opus, we?d like to consider the topic of transport. Traditionally we?ve relied on TCP for reasons of reliability. Opus, with its packet redundancy features, offers an attractive real-time alternative that we will soon be testing. But in order to apply an apples-apples comparison we need to model both data rates and latency in real world scenarios. For UDP, I?m assuming that the redundancy feature adds no additional latency. Correct? On the data rate question, I see that the Opusenc tool provides an ?expec-loss? parameter with the value expressed as a percentage. Could someone please describe how this is implemented? Are you simply removing some percentage of packets from the result, or is there a more complex model underpinning the exercise? Modeling TCP data rates and latency in similarly losssy scenarios seems much more difficult since dropped packets have cascading effects. Has anyone on this list considered this class of comparison? Any suggestions for modeling software that could aid my search? Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20121129/23a46651/attachment-0001.htm
The Low Bit-Rate Redundancy (LBRR) feature works by sometimes adding another copy of packet N-1 onto packet N. This means that a traditional decoder that wishes to use the LBRR info must increase its jitter buffer depth, and hence latency, by one packet, so that if packet K is dropped, there is time to receive and decode packet K+1. However, in the case of ASR I think this logic does not apply, and the latency penalty must only be paid in the rare case when the last packet before a data return event is dropped. Expect-loss is not a simulation of loss. It is a way to tell the encoder "I expect that my network will drop X% of packets". The higher the value of X, the more the encoder will spend bits to avoid depending on previous packets that may not have arrived. This means increasing the use of LBRR, decreasing the use of sensitive long-term filters, and many other changes in encoding strategy. On Wed, Nov 28, 2012 at 3:50 PM, Young, Milan <Milan.Young at nuance.com>wrote:> For the last couple months, Nuance has performed extensive testing on > how the Opus codec performs in the speech recognition task. I?m hoping to > publish a full report in the coming months, but until then all I have is a > teaser. Opus performed within about 1% of the WER (Word Error Rate) of > unencoded audio. This is compared to about 5% for Speex, which was the > previous codec of choice. Well done to you all!**** > > ** ** > > As Nuance considers migrating to Opus, we?d like to consider the topic of > transport. Traditionally we?ve relied on TCP for reasons of reliability. > Opus, with its packet redundancy features, offers an attractive real-time > alternative that we will soon be testing. But in order to apply an > apples-apples comparison we need to model both data rates and latency in > real world scenarios.**** > > ** ** > > For UDP, I?m assuming that the redundancy feature adds no additional > latency. Correct? On the data rate question, I see that the Opusenc tool > provides an ?expec-loss? parameter with the value expressed as a > percentage. Could someone please describe how this is implemented? Are > you simply removing some percentage of packets from the result, or is there > a more complex model underpinning the exercise?**** > > ** ** > > Modeling TCP data rates and latency in similarly losssy scenarios seems > much more difficult since dropped packets have cascading effects. Has > anyone on this list considered this class of comparison? Any suggestions > for modeling software that could aid my search?**** > > ** ** > > Thank you**** > > ** ** > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20121130/73a09d45/attachment.htm