thr3ads.net - opus - [opus] Reg an issue with smoothing factor in VAD implementation [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Logan Stromberg

2017-Nov-20 21:08 UTC

[opus] Reg an issue with smoothing factor in VAD implementation

Just for fun, I tried to reproduce such an overflow. I turned on all debug
macros, assertions, and checked arithmetic and then encoded 2 hours of
mixed speech/audio with these parameters:

Sample rate = 48000
Channels = 1
Application = OPUS_APPLICATION_AUDIO
Bitrate = 24 KB/s
Force Mode = MODE_SILK_ONLY
Signal Type = OPUS_SIGNAL_AUTO
Complexity = 10
Frame size = 480 samples (10ms)

No errors came up in encoding. Chandrakala, are these the encoding
parameters that you believe should trigger the error?

- Logan

Hi,>
> We are looking at the VAD implementation used in opus. We are looking at
> the code where speech probability is calculated based on which SNR is
> estimated. Below is the part of the code I am talking about.
>
> /*********************************/
> /* Speech Probability Estimation */
> /*********************************/
> SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
> VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment
> by me
>
> /* Power scaling */
> if( speech_nrg <= 0 ) { // step2: update speech probability based on
> speech energy : comment by me
> SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
> } else if( speech_nrg < 32768 ) {
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled
> here : comment by me
> } else {
> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
> }
>
> /* square-root */
> speech_nrg = silk_SQRT_APPROX( speech_nrg );
> SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
> }
>
> /* Smoothing coefficient */
> smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
> (opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based
> on speech probability : comment by me
>
> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
> smooth_coef_Q16 >>= 1;
> }
>
> Here, in step1, Speech probability is calculated whose value is expected
> to be within [0, 1) in Q15 format. Then based on the speech energy levels,
> in Step2, the probability is updated whose value shall also lie between [0,
> 1). Later in Step3, the smooth coeff is calculated. This code do not have
> any issue when the frame size is more than or equal to 20msec. But, if the
> frame size is 10ms, then in step2, the energy is doubled (this may be done
> because the original Silk code is for 20ms. To convert the energy for 20ms,
> it could have been doubled). When this is done the probability which is
> updated in step2 becomes more than 1. When this is used in multiplication
> in Step3, the value is treated as a negative number because its a 32x16
> multiplication. This is will result in a negative smooth coefficient.
> Please let me know if this is a bug.
>
>
> Thank you,
> Chandrakala-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20171120/56b6c725/attachment.html>

Chandrakala Madhira

2017-Nov-22 04:23 UTC

head link

[opus] Reg an issue with smoothing factor in VAD implementation

Hi Logan, 

Below is the configuration I am using. The remaining parameters have defaults.
The stream is a 16KHz stream. The bitrate selected is 16KHz. As the thread have
size limitations, I will try to attach the stream in next mail. At 125th frame
of this stream, the probability (SA_Q15) value is 0x0000859c. This value is
treated as a negative number in step-3.

-e voip 16000 1 16000 -framesize 10 -cvbr Input/dg105_16k.wav
Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit

Thank you, 
Chandrakala 


Thank you, 
Chandrakala 

----- Original Message -----

From: "Logan Stromberg" <loganstromberg at gmail.com> 
To: opus at xiph.org 
Sent: Tuesday, November 21, 2017 2:38:16 AM 
Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 




Just for fun, I tried to reproduce such an overflow. I turned on all debug
macros, assertions, and checked arithmetic and then encoded 2 hours of mixed
speech/audio with these parameters:



Sample rate = 48000 
Channels = 1 
Application = OPUS_APPLICATION_AUDIO 
Bitrate = 24 KB/s 

Force Mode = MODE_SILK_ONLY 
Signal Type = OPUS_SIGNAL_AUTO 
Complexity = 10 Frame size = 480 samples (10ms) 



No errors came up in encoding. Chandrakala, are these the encoding parameters
that you believe should trigger the error?


- Logan 






Hi, 

We are looking at the VAD implementation used in opus. We are looking at the
code where speech probability is calculated based on which SNR is estimated.
Below is the part of the code I am talking about.

/*********************************/ 
/* Speech Probability Estimation */ 
/*********************************/ 
SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment by me

/* Power scaling */ 
if( speech_nrg <= 0 ) { // step2: update speech probability based on speech
energy : comment by me
SA_Q15 = silk_RSHIFT( SA_Q15, 1 ); 
} else if( speech_nrg < 32768 ) { 
if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled here :
comment by me
} else { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 ); 
} 

/* square-root */ 
speech_nrg = silk_SQRT_APPROX( speech_nrg ); 
SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 ); 
} 

/* Smoothing coefficient */ 
smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
(opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based on
speech probability : comment by me

if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
smooth_coef_Q16 >>= 1; 
} 

Here, in step1, Speech probability is calculated whose value is expected to be
within [0, 1) in Q15 format. Then based on the speech energy levels, in Step2,
the probability is updated whose value shall also lie between [0, 1). Later in
Step3, the smooth coeff is calculated. This code do not have any issue when the
frame size is more than or equal to 20msec. But, if the frame size is 10ms, then
in step2, the energy is doubled (this may be done because the original Silk code
is for 20ms. To convert the energy for 20ms, it could have been doubled). When
this is done the probability which is updated in step2 becomes more than 1. When
this is used in multiplication in Step3, the value is treated as a negative
number because its a 32x16 multiplication. This is will result in a negative
smooth coefficient. Please let me know if this is a bug.


Thank you, 
Chandrakala 


_______________________________________________ 
opus mailing list 
opus at xiph.org 
http://lists.xiph.org/mailman/listinfo/opus 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20171122/39119c82/attachment.html>

Logan Stromberg

2017-Nov-22 06:42 UTC

head link

[opus] Reg an issue with smoothing factor in VAD implementation

Yes, yes, I can reproduce it now, but only on platforms that define a
16-bit int by default (SA_Q15 is an opus_int rather than opus_int32). What
system are you compiling this for?

On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira <
chandrakala.madhira at soctronics.com> wrote:
> Hi Logan,
>
> Please find attached the input stream we are using testing.
>
> Thank you,
> Chandrakala
>
> ------------------------------
> *From: *"Chandrakala Madhira" <chandrakala.madhira at
soctronics.com>
> *To: *"Logan Stromberg" <loganstromberg at gmail.com>
> *Cc: *opus at xiph.org
> *Sent: *Wednesday, November 22, 2017 9:53:55 AM
>
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Hi Logan,
>
> Below is the configuration I am using. The remaining parameters have
> defaults. The stream is a 16KHz stream. The bitrate selected is 16KHz. As
> the thread have size limitations, I will try to attach the stream in next
> mail. At 125th frame of this stream, the probability (SA_Q15) value is
> 0x0000859c. This value is treated as a negative number in step-3.
>
> -e voip 16000 1 16000  -framesize 10  -cvbr Input/dg105_16k.wav
> Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit
>
> Thank you,
> Chandrakala
>
> Thank you,
> Chandrakala
>
> ------------------------------
> *From: *"Logan Stromberg" <loganstromberg at gmail.com>
> *To: *opus at xiph.org
> *Sent: *Tuesday, November 21, 2017 2:38:16 AM
> *Subject: *Re: [opus] Reg an issue with smoothing factor in VAD
> implementation
>
> Just for fun, I tried to reproduce such an overflow. I turned on all debug
> macros, assertions, and checked arithmetic and then encoded 2 hours of
> mixed speech/audio with these parameters:
>
> Sample rate = 48000
> Channels = 1
> Application = OPUS_APPLICATION_AUDIO
> Bitrate = 24 KB/s
> Force Mode = MODE_SILK_ONLY
> Signal Type = OPUS_SIGNAL_AUTO
> Complexity = 10
> Frame size = 480 samples (10ms)
>
> No errors came up in encoding. Chandrakala, are these the encoding
> parameters that you believe should trigger the error?
>
> - Logan
>
> Hi,
>>
>> We are looking at the VAD implementation used in opus. We are looking
at
>> the code where speech probability is calculated based on which SNR is
>> estimated. Below is the part of the code I am talking about.
>>
>> /*********************************/
>> /* Speech Probability Estimation */
>> /*********************************/
>> SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
>> VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability :
comment
>> by me
>>
>> /* Power scaling */
>> if( speech_nrg <= 0 ) { // step2: update speech probability based on
>> speech energy : comment by me
>> SA_Q15 = silk_RSHIFT( SA_Q15, 1 );
>> } else if( speech_nrg < 32768 ) {
>> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
>> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled
>> here : comment by me
>> } else {
>> speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 );
>> }
>>
>> /* square-root */
>> speech_nrg = silk_SQRT_APPROX( speech_nrg );
>> SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 );
>> }
>>
>> /* Smoothing coefficient */
>> smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
>> (opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor
based
>> on speech probability : comment by me
>>
>> if( psEncC->frame_length == 10 * psEncC->fs_kHz ) {
>> smooth_coef_Q16 >>= 1;
>> }
>>
>> Here, in step1, Speech probability is calculated whose value is
expected
>> to be within [0, 1) in Q15 format. Then based on the speech energy
levels,
>> in Step2, the probability is updated whose value shall also lie between
[0,
>> 1). Later in Step3, the smooth coeff is calculated. This code do not
have
>> any issue when the frame size is more than or equal to 20msec. But, if
the
>> frame size is 10ms, then in step2, the energy is doubled (this may be
done
>> because the original Silk code is for 20ms. To convert the energy for
20ms,
>> it could have been doubled). When this is done the probability which is
>> updated in step2 becomes more than 1. When this is used in
multiplication
>> in Step3, the value is treated as a negative number because its a 32x16
>> multiplication. This is will result in a negative smooth coefficient.
>> Please let me know if this is a bug.
>>
>>
>> Thank you,
>> Chandrakala
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20171121/ba47653d/attachment-0001.html>

Chandrakala Madhira

2017-Nov-22 10:43 UTC

head link

[opus] Reg an issue with smoothing factor in VAD implementation

Hi Logan, 

We are using MSVC platform. Here Opus_int is treated as 32-bit value but, when
it comes to the multiplication, the lower 16-bits are considered. At this point,
the positive value becomes a negative value. I think, this may happen on all
platforms. So, I am just wondering if this is a bug.


Thank you, 
Chandrakala 

----- Original Message -----

From: "Logan Stromberg" <loganstromberg at gmail.com> 
To: "Chandrakala Madhira" <chandrakala.madhira at
soctronics.com>
Cc: opus at xiph.org 
Sent: Wednesday, November 22, 2017 12:12:39 PM 
Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 


Yes, yes, I can reproduce it now, but only on platforms that define a 16-bit int
by default (SA_Q15 is an opus_int rather than opus_int32). What system are you
compiling this for?



On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira < chandrakala.madhira at
soctronics.com > wrote:




Hi Logan, 

Please find attached the input stream we are using testing. 


Thank you, 
Chandrakala 



From: "Chandrakala Madhira" < chandrakala.madhira at soctronics.com
>
To: "Logan Stromberg" < loganstromberg at gmail.com > 
Cc: opus at xiph.org 
Sent: Wednesday, November 22, 2017 9:53:55 AM 


Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 


Hi Logan, 

Below is the configuration I am using. The remaining parameters have defaults.
The stream is a 16KHz stream. The bitrate selected is 16KHz. As the thread have
size limitations, I will try to attach the stream in next mail. At 125th frame
of this stream, the probability (SA_Q15) value is 0x0000859c. This value is
treated as a negative number in step-3.

-e voip 16000 1 16000 -framesize 10 -cvbr Input/dg105_16k.wav
Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit

Thank you, 
Chandrakala 


Thank you, 
Chandrakala 



From: "Logan Stromberg" < loganstromberg at gmail.com > 
To: opus at xiph.org 
Sent: Tuesday, November 21, 2017 2:38:16 AM 
Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 




Just for fun, I tried to reproduce such an overflow. I turned on all debug
macros, assertions, and checked arithmetic and then encoded 2 hours of mixed
speech/audio with these parameters:



Sample rate = 48000 
Channels = 1 
Application = OPUS_APPLICATION_AUDIO 
Bitrate = 24 KB/s 

Force Mode = MODE_SILK_ONLY 
Signal Type = OPUS_SIGNAL_AUTO 
Complexity = 10 Frame size = 480 samples (10ms) 



No errors came up in encoding. Chandrakala, are these the encoding parameters
that you believe should trigger the error?


- Logan 





<blockquote>
Hi, 

We are looking at the VAD implementation used in opus. We are looking at the
code where speech probability is calculated based on which SNR is estimated.
Below is the part of the code I am talking about.

/*********************************/ 
/* Speech Probability Estimation */ 
/*********************************/ 
SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment by me

/* Power scaling */ 
if( speech_nrg <= 0 ) { // step2: update speech probability based on speech
energy : comment by me
SA_Q15 = silk_RSHIFT( SA_Q15, 1 ); 
} else if( speech_nrg < 32768 ) { 
if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled here :
comment by me
} else { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 ); 
} 

/* square-root */ 
speech_nrg = silk_SQRT_APPROX( speech_nrg ); 
SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 ); 
} 

/* Smoothing coefficient */ 
smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
(opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based on
speech probability : comment by me

if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
smooth_coef_Q16 >>= 1; 
} 

Here, in step1, Speech probability is calculated whose value is expected to be
within [0, 1) in Q15 format. Then based on the speech energy levels, in Step2,
the probability is updated whose value shall also lie between [0, 1). Later in
Step3, the smooth coeff is calculated. This code do not have any issue when the
frame size is more than or equal to 20msec. But, if the frame size is 10ms, then
in step2, the energy is doubled (this may be done because the original Silk code
is for 20ms. To convert the energy for 20ms, it could have been doubled). When
this is done the probability which is updated in step2 becomes more than 1. When
this is used in multiplication in Step3, the value is treated as a negative
number because its a 32x16 multiplication. This is will result in a negative
smooth coefficient. Please let me know if this is a bug.


Thank you, 
Chandrakala 


_______________________________________________ 
opus mailing list 
opus at xiph.org 
http://lists.xiph.org/mailman/listinfo/opus 



</blockquote>


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20171122/cc0b1fe9/attachment.html>

Chandrakala Madhira

2017-Nov-27 04:07 UTC

head link

[opus] Reg an issue with smoothing factor in VAD implementation

Hi, 

Can anyone let me know if this is a bug? 


Thank you, 
Chandrakala 

----- Original Message -----

From: "Logan Stromberg" <loganstromberg at gmail.com> 
To: "Chandrakala Madhira" <chandrakala.madhira at
soctronics.com>
Cc: opus at xiph.org 
Sent: Wednesday, November 22, 2017 12:12:39 PM 
Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 


Yes, yes, I can reproduce it now, but only on platforms that define a 16-bit int
by default (SA_Q15 is an opus_int rather than opus_int32). What system are you
compiling this for?



On Tue, Nov 21, 2017 at 8:34 PM, Chandrakala Madhira < chandrakala.madhira at
soctronics.com > wrote:




Hi Logan, 

Please find attached the input stream we are using testing. 


Thank you, 
Chandrakala 



From: "Chandrakala Madhira" < chandrakala.madhira at soctronics.com
>
To: "Logan Stromberg" < loganstromberg at gmail.com > 
Cc: opus at xiph.org 
Sent: Wednesday, November 22, 2017 9:53:55 AM 


Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 


Hi Logan, 

Below is the configuration I am using. The remaining parameters have defaults.
The stream is a 16KHz stream. The bitrate selected is 16KHz. As the thread have
size limitations, I will try to attach the stream in next mail. At 125th frame
of this stream, the probability (SA_Q15) value is 0x0000859c. This value is
treated as a negative number in step-3.

-e voip 16000 1 16000 -framesize 10 -cvbr Input/dg105_16k.wav
Output/dg105_16k_16000Fs_mono_16000bps_10ms_vbr_voip.bit

Thank you, 
Chandrakala 


Thank you, 
Chandrakala 



From: "Logan Stromberg" < loganstromberg at gmail.com > 
To: opus at xiph.org 
Sent: Tuesday, November 21, 2017 2:38:16 AM 
Subject: Re: [opus] Reg an issue with smoothing factor in VAD implementation 




Just for fun, I tried to reproduce such an overflow. I turned on all debug
macros, assertions, and checked arithmetic and then encoded 2 hours of mixed
speech/audio with these parameters:



Sample rate = 48000 
Channels = 1 
Application = OPUS_APPLICATION_AUDIO 
Bitrate = 24 KB/s 

Force Mode = MODE_SILK_ONLY 
Signal Type = OPUS_SIGNAL_AUTO 
Complexity = 10 Frame size = 480 samples (10ms) 



No errors came up in encoding. Chandrakala, are these the encoding parameters
that you believe should trigger the error?


- Logan 





<blockquote>
Hi, 

We are looking at the VAD implementation used in opus. We are looking at the
code where speech probability is calculated based on which SNR is estimated.
Below is the part of the code I am talking about.

/*********************************/ 
/* Speech Probability Estimation */ 
/*********************************/ 
SA_Q15 = silk_sigm_Q15( silk_SMULWB( VAD_SNR_FACTOR_Q16, pSNR_dB_Q7 ) -
VAD_NEGATIVE_OFFSET_Q5 ); // step1: Calculate speech probability : comment by me

/* Power scaling */ 
if( speech_nrg <= 0 ) { // step2: update speech probability based on speech
energy : comment by me
SA_Q15 = silk_RSHIFT( SA_Q15, 1 ); 
} else if( speech_nrg < 32768 ) { 
if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 16 ); // Energy is doubled here :
comment by me
} else { 
speech_nrg = silk_LSHIFT_SAT32( speech_nrg, 15 ); 
} 

/* square-root */ 
speech_nrg = silk_SQRT_APPROX( speech_nrg ); 
SA_Q15 = silk_SMULWB( 32768 + speech_nrg, SA_Q15 ); 
} 

/* Smoothing coefficient */ 
smooth_coef_Q16 = silk_SMULWB( VAD_SNR_SMOOTH_COEF_Q18, silk_SMULWB(
(opus_int32)SA_Q15, SA_Q15 ) ); // step3: Update the smoothing factor based on
speech probability : comment by me

if( psEncC->frame_length == 10 * psEncC->fs_kHz ) { 
smooth_coef_Q16 >>= 1; 
} 

Here, in step1, Speech probability is calculated whose value is expected to be
within [0, 1) in Q15 format. Then based on the speech energy levels, in Step2,
the probability is updated whose value shall also lie between [0, 1). Later in
Step3, the smooth coeff is calculated. This code do not have any issue when the
frame size is more than or equal to 20msec. But, if the frame size is 10ms, then
in step2, the energy is doubled (this may be done because the original Silk code
is for 20ms. To convert the energy for 20ms, it could have been doubled). When
this is done the probability which is updated in step2 becomes more than 1. When
this is used in multiplication in Step3, the value is treated as a negative
number because its a 32x16 multiplication. This is will result in a negative
smooth coefficient. Please let me know if this is a bug.


Thank you, 
Chandrakala 


_______________________________________________ 
opus mailing list 
opus at xiph.org 
http://lists.xiph.org/mailman/listinfo/opus 



</blockquote>


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20171127/00d363b1/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

opus - Nov 2017 - Reg an issue with smoothing factor in VAD implementation

[opus] Reg an issue with smoothing factor in VAD implementation

[opus] Reg an issue with smoothing factor in VAD implementation

[opus] Reg an issue with smoothing factor in VAD implementation

[opus] Reg an issue with smoothing factor in VAD implementation

[opus] Reg an issue with smoothing factor in VAD implementation

Possibly Parallel Threads