thr3ads.net - opus - [opus] [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2017-Jun-07 06:46 UTC

[opus] Submitting a patch that exposes VAD voiced/unvoiced signal type

Hi Peter,

There's two main issues with a patch like the one you're proposing.
First, the data is only valid when SILK is being used and is essentially
undefined in CELT mode. The second issue is that by exposing internals,
it makes it impossible to improve these algorithms since it would break
API compatibility. I'm not fundamentally against trying to expose some
information, but there would have to be a way to address those two issues.

On a slightly different topic, have you looked at the VAD probability
that's computed in analysis.c (along with the speech/music probability)?

Cheers,

	Jean-Marc

> I'm reaching out because we'd like to contribute back to the
project
> a patch that exposes the signal type of the audio packet when
> encoding the PCM audio to OPUS. We've found the Opus VAD algorithm to
> be exceptional in this regard and have written a library that
> leverages this information for audio end-pointing. Attached is the
> patch. Please let us know if you'd be willing to accept it, or if
> you'd prefer we fork libopus or recommend some other option.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170607/3f73c8a4/attachment.sig>

Freshman, Peter

2017-Jun-08 12:20 UTC

head link

[opus] [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type

Hi Jean-Marc,

Thank you for the valuable feedback. You're correct in that we focused on
enabling this just for SILK. Because our solutions are focused on voice, we did
not explore doing the same in CELT mode, but we can certainly look into the
details of analysis.c.


Regarding the concern of exposing internals, do you have a specific proposal in
mind?


We've been sharing this patch with our customers over the last several
months, and the preference obviously would be to have it in the public domain.
We're interested in any opportunity to accelerate this.


Thanks,
Peter

________________________________
From: Jean-Marc Valin <jmvalin at mozilla.com>
Sent: Wednesday, June 7, 2017 2:46:52 AM
To: Freshman, Peter; opus at xiph.org
Subject: [EXTERNAL] Re: [opus] Submitting a patch that exposes VAD
voiced/unvoiced signal type

Hi Peter,

There's two main issues with a patch like the one you're proposing.
First, the data is only valid when SILK is being used and is essentially
undefined in CELT mode. The second issue is that by exposing internals,
it makes it impossible to improve these algorithms since it would break
API compatibility. I'm not fundamentally against trying to expose some
information, but there would have to be a way to address those two issues.

On a slightly different topic, have you looked at the VAD probability
that's computed in analysis.c (along with the speech/music probability)?

Cheers,

        Jean-Marc

> I'm reaching out because we'd like to contribute back to the
project
> a patch that exposes the signal type of the audio packet when
> encoding the PCM audio to OPUS. We've found the Opus VAD algorithm to
> be exceptional in this regard and have written a library that
> leverages this information for audio end-pointing. Attached is the
> patch. Please let us know if you'd be willing to accept it, or if
> you'd prefer we fork libopus or recommend some other option.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170608/32c7b4f8/attachment.html>

Jean-Marc Valin

2017-Jun-16 18:27 UTC

head link

[opus] [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type

Hi Peter,

Can you say a little bit more about what you're doing exactly with the
information you're exposing and how? unfortunately, I don't have a
concrete proposal in mind right now. That's in part because I don't
quite understand the use case, but also because it's really hard to
expose this kind of information in a way that both avoids breaking
application with new versions and doesn't prevent future improvements to
Opus.

Cheers,

	Jean-Marc

On 08/06/17 08:20 AM, Freshman, Peter wrote:> Hi Jean-Marc,
> 
> Thank you for the valuable feedback. You're correct in that we focused
> on enabling this just for SILK. Because our solutions are focused on
> voice, we did not explore doing the same in CELT mode, but we can
> certainly look into the details of analysis.c.
> 
> 
> Regarding the concern of exposing internals, do you have a specific
> proposal in mind?
> 
> 
> We've been sharing this patch with our customers over the last several
> months, and the preference obviously would be to have it in the public
> domain. We're interested in any opportunity to accelerate this.
> 
> 
> Thanks,
> Peter
> 
> ------------------------------------------------------------------------
> *From:* Jean-Marc Valin <jmvalin at mozilla.com>
> *Sent:* Wednesday, June 7, 2017 2:46:52 AM
> *To:* Freshman, Peter; opus at xiph.org
> *Subject:* [EXTERNAL] Re: [opus] Submitting a patch that exposes VAD
> voiced/unvoiced signal type
>  
> Hi Peter,
> 
> There's two main issues with a patch like the one you're proposing.
> First, the data is only valid when SILK is being used and is essentially
> undefined in CELT mode. The second issue is that by exposing internals,
> it makes it impossible to improve these algorithms since it would break
> API compatibility. I'm not fundamentally against trying to expose some
> information, but there would have to be a way to address those two issues.
> 
> On a slightly different topic, have you looked at the VAD probability
> that's computed in analysis.c (along with the speech/music
probability)?
> 
> Cheers,
> 
>         Jean-Marc
> 
> 
>> I'm reaching out because we'd like to contribute back to the
project
>> a patch that exposes the signal type of the audio packet when
>> encoding the PCM audio to OPUS. We've found the Opus VAD algorithm
to
>> be exceptional in this regard and have written a library that
>> leverages this information for audio end-pointing. Attached is the
>> patch. Please let us know if you'd be willing to accept it, or if
>> you'd prefer we fork libopus or recommend some other option.
> 
> 
> 
> 
> 
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>

Seemingly Similar Threads

Search for more reasonably related threads

opus - Jun 2017 - [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type

[opus] Submitting a patch that exposes VAD voiced/unvoiced signal type

[opus] [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type

[opus] [EXTERNAL] Re: Submitting a patch that exposes VAD voiced/unvoiced signal type

Seemingly Similar Threads