Hi Jean-Marc, well spotted! The patch provided fixes the issue for me. Nevertheless, in my code (and I would suggest doing the same in libopus) I am going to replace the function with one that accumulates on 64 bits and then calculates the shift, for at least 4 reasons: - It is less and simpler code - The result is likely to be slightly more accurate in case big numbers come early in the array. - I don?t like to have branching inside loops - Even though it seems to work on most compilers, assuming that overflow in the accumulator will lead to a negative result calls for undefined behaviour (NOT implementation defined): if left as is the code should at least use an unsigned accumulator and replace the test (nrg < 0) with (nrg> MAX_INT32).Can you see any issue in doing so? If you also think it might be a good idea to do the same in the libopus repository, I can send you the suggested patch. Cheers, Marcello On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote:>Hi Marcello, > >It turns out that the problem has a much simpler explanation. As far as >I can tell, it's actually a bug in silk_sum_sqr_shift() and this trivial >patch appears to fix the problem: >http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch > >It would still require some testing to check that the fix doesn't have >any bad side effect. Let me know how well the fix works for you. Again, >thanks for investigating this. > >Cheers, > > Jean-Marc > >On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote: >> Hi Jean Marc, >> >> please find attached the audio file (mono 16khz). I shortened it to >>about >> 10 seconds. I also add 2 patches that worked for me. Further info that >> might help: >> >> - The problem seems to be related to silk_burg_modified not reaching the >> maximum gain, so the actual filter order is 16 rather than 2 (which is >> what would be expected with a sine wave). >> - The problem seems to happen when rshifts >= 3 >> - when pre-scaling the signal to be < 16384 the problem goes away (patch >> scale_burg_in.diff) >> - When calculating C0 and rshifts based on a 64 bits correlation instead >> of using silk_sum_sqr_shift the problem also goes away (patch >> new_C0_calc.diff) >> >> I suspect that for very high prediction gain the fixed point >> implementation becomes very sensitive to numerical error, but I am not >>too >> sure why the new versions work better. >> I favour the version with the new C0 calculation, as it avoids rescaling >> the input. >> I also played around with a version (not attached) that prescales the >> input by rshifts/2 - this might be considering as it simplifies the >>code. >> >> Regards, >> >> Marcello >> >> PS: I am using 1.1 but the same issue is present with the latest code >>well. >> >> >> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote: >> >>> Hi Marcello, >>> >>> Thanks for the report. It's hard to debug this without the actual file. >>> Can you please post the sweep_in.raw file you used? >>> >>> Cheers, >>> >>> Jean-Marc >>> >>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote: >>>> Hi, >>>> >>>> Apologies if this is a known issues, but I have found what I believe >>>>is >>>> a bug in the fixed point implementation of the Silk codec and could >>>>not >>>> find any mention on this in the archives. >>>> >>>> The bug can be easily reproduced with the fixed point demo program >>>> (./configure ?enable-fixed-point ?disable-float-api && make) using the >>>> following command: >>>> >>>> ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw >>>> >>>> Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0 >>>>to >>>> 8kHz sampled at 16kHz. >>>> >>>> The first 6 seconds of audio after transcoding sound Ok. After that >>>> artefacts are introduced all the way to the end of the file. >>>> The floating point version does not have the issue (even though the >>>> quality is subjectively worse roughly from the same point). >>>> >>>> I believe I narrowed down the problem to the file burg_modified_FIX.c >>>>? >>>> if I make sure the input signal is scaled down to 14 bits before >>>> processing the coefficients of the predictor are calculated correctly >>>> and no artefact is introduced. >>>> >>>> Is anyone experiencing the same problem or has a proper fix for this? >>>>(I >>>> can work around the bug with input scaling for now). >>>> >>>> Thanks and best regards, >>>> >>>> Marcello Caramma >>>> >>>> >>>> _______________________________________________ >>>> opus mailing list >>>> opus at xiph.org >>>> http://lists.xiph.org/mailman/listinfo/opus >>>> >>
Hi Marcello, Actually, we were careful to avoid the undefined behaviour here. In fact, we are specifically running a clang test detecting undefined behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it is based on silk_ADD32_ovflw(), which is defined as: #define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) + (opus_uint32)(b))) By casting to unsigned, all the cases are well defined. The cast back to int is implementation-defined but we are already assuming two's complement behaviour every time we are shifting. As for using the 64-bit version anyway, I'm not sure of the impact since I'm not the original author of the code. Koen, what are the advantages of silk_sum_sqr_shift() over the 64-bit version? See any issue with using one over the other? Cheers, Jean-Marc On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:> Hi Jean-Marc, > > well spotted! The patch provided fixes the issue for me. > > Nevertheless, in my code (and I would suggest doing the same in libopus) I > am going to replace the function with one that accumulates on 64 bits and > then calculates the shift, for at least 4 reasons: > - It is less and simpler code > - The result is likely to be slightly more accurate in case big numbers > come early in the array. > - I don?t like to have branching inside loops > > - Even though it seems to work on most compilers, assuming that overflow > in the accumulator will lead to a negative result calls for undefined > behaviour (NOT implementation defined): if left as is the code should at > least use an unsigned accumulator and replace the test (nrg < 0) with (nrg >> MAX_INT32). > > Can you see any issue in doing so? > If you also think it might be a good idea to do the same in the libopus > repository, I can send you the suggested patch. > > Cheers, > > Marcello > > > On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote: > >> Hi Marcello, >> >> It turns out that the problem has a much simpler explanation. As far as >> I can tell, it's actually a bug in silk_sum_sqr_shift() and this trivial >> patch appears to fix the problem: >> http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch >> >> It would still require some testing to check that the fix doesn't have >> any bad side effect. Let me know how well the fix works for you. Again, >> thanks for investigating this. >> >> Cheers, >> >> Jean-Marc >> >> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote: >>> Hi Jean Marc, >>> >>> please find attached the audio file (mono 16khz). I shortened it to >>> about >>> 10 seconds. I also add 2 patches that worked for me. Further info that >>> might help: >>> >>> - The problem seems to be related to silk_burg_modified not reaching the >>> maximum gain, so the actual filter order is 16 rather than 2 (which is >>> what would be expected with a sine wave). >>> - The problem seems to happen when rshifts >= 3 >>> - when pre-scaling the signal to be < 16384 the problem goes away (patch >>> scale_burg_in.diff) >>> - When calculating C0 and rshifts based on a 64 bits correlation instead >>> of using silk_sum_sqr_shift the problem also goes away (patch >>> new_C0_calc.diff) >>> >>> I suspect that for very high prediction gain the fixed point >>> implementation becomes very sensitive to numerical error, but I am not >>> too >>> sure why the new versions work better. >>> I favour the version with the new C0 calculation, as it avoids rescaling >>> the input. >>> I also played around with a version (not attached) that prescales the >>> input by rshifts/2 - this might be considering as it simplifies the >>> code. >>> >>> Regards, >>> >>> Marcello >>> >>> PS: I am using 1.1 but the same issue is present with the latest code >>> well. >>> >>> >>> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote: >>> >>>> Hi Marcello, >>>> >>>> Thanks for the report. It's hard to debug this without the actual file. >>>> Can you please post the sweep_in.raw file you used? >>>> >>>> Cheers, >>>> >>>> Jean-Marc >>>> >>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote: >>>>> Hi, >>>>> >>>>> Apologies if this is a known issues, but I have found what I believe >>>>> is >>>>> a bug in the fixed point implementation of the Silk codec and could >>>>> not >>>>> find any mention on this in the archives. >>>>> >>>>> The bug can be easily reproduced with the fixed point demo program >>>>> (./configure ?enable-fixed-point ?disable-float-api && make) using the >>>>> following command: >>>>> >>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw >>>>> >>>>> Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0 >>>>> to >>>>> 8kHz sampled at 16kHz. >>>>> >>>>> The first 6 seconds of audio after transcoding sound Ok. After that >>>>> artefacts are introduced all the way to the end of the file. >>>>> The floating point version does not have the issue (even though the >>>>> quality is subjectively worse roughly from the same point). >>>>> >>>>> I believe I narrowed down the problem to the file burg_modified_FIX.c >>>>> ? >>>>> if I make sure the input signal is scaled down to 14 bits before >>>>> processing the coefficients of the predictor are calculated correctly >>>>> and no artefact is introduced. >>>>> >>>>> Is anyone experiencing the same problem or has a proper fix for this? >>>>> (I >>>>> can work around the bug with input scaling for now). >>>>> >>>>> Thanks and best regards, >>>>> >>>>> Marcello Caramma >>>>> >>>>> >>>>> _______________________________________________ >>>>> opus mailing list >>>>> opus at xiph.org >>>>> http://lists.xiph.org/mailman/listinfo/opus >>>>> >>> >
Right, there shouldn't be a problem with undefined behavior. That said, a 64 bit implementation will work very well - in fact that's how it was done originally. The reason for the current implementation is to minimize 64-bit operations in order to improve performance on limited-width architectures. This functions gets used extensively, and I think the current implementation is faster on a 32 or 16 bit processor. If you would find the opposite to be true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider. Btw, thanks for your help tracking down that bug! koen. On Fri, Jun 20, 2014 at 9:46 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> Hi Marcello, > > Actually, we were careful to avoid the undefined behaviour here. In > fact, we are specifically running a clang test detecting undefined > behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it > is based on silk_ADD32_ovflw(), which is defined as: > > #define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) + > (opus_uint32)(b))) > > By casting to unsigned, all the cases are well defined. The cast back to > int is implementation-defined but we are already assuming two's > complement behaviour every time we are shifting. > > As for using the 64-bit version anyway, I'm not sure of the impact since > I'm not the original author of the code. Koen, what are the advantages > of silk_sum_sqr_shift() over the 64-bit version? See any issue with > using one over the other? > > Cheers, > > Jean-Marc > > On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote: > > Hi Jean-Marc, > > > > well spotted! The patch provided fixes the issue for me. > > > > Nevertheless, in my code (and I would suggest doing the same in libopus) > I > > am going to replace the function with one that accumulates on 64 bits and > > then calculates the shift, for at least 4 reasons: > > - It is less and simpler code > > - The result is likely to be slightly more accurate in case big numbers > > come early in the array. > > - I don?t like to have branching inside loops > > > > - Even though it seems to work on most compilers, assuming that overflow > > in the accumulator will lead to a negative result calls for undefined > > behaviour (NOT implementation defined): if left as is the code should at > > least use an unsigned accumulator and replace the test (nrg < 0) with > (nrg > >> MAX_INT32). > > > > Can you see any issue in doing so? > > If you also think it might be a good idea to do the same in the libopus > > repository, I can send you the suggested patch. > > > > Cheers, > > > > Marcello > > > > > > On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote: > > > >> Hi Marcello, > >> > >> It turns out that the problem has a much simpler explanation. As far as > >> I can tell, it's actually a bug in silk_sum_sqr_shift() and this trivial > >> patch appears to fix the problem: > >> http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch > >> > >> It would still require some testing to check that the fix doesn't have > >> any bad side effect. Let me know how well the fix works for you. Again, > >> thanks for investigating this. > >> > >> Cheers, > >> > >> Jean-Marc > >> > >> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote: > >>> Hi Jean Marc, > >>> > >>> please find attached the audio file (mono 16khz). I shortened it to > >>> about > >>> 10 seconds. I also add 2 patches that worked for me. Further info that > >>> might help: > >>> > >>> - The problem seems to be related to silk_burg_modified not reaching > the > >>> maximum gain, so the actual filter order is 16 rather than 2 (which is > >>> what would be expected with a sine wave). > >>> - The problem seems to happen when rshifts >= 3 > >>> - when pre-scaling the signal to be < 16384 the problem goes away > (patch > >>> scale_burg_in.diff) > >>> - When calculating C0 and rshifts based on a 64 bits correlation > instead > >>> of using silk_sum_sqr_shift the problem also goes away (patch > >>> new_C0_calc.diff) > >>> > >>> I suspect that for very high prediction gain the fixed point > >>> implementation becomes very sensitive to numerical error, but I am not > >>> too > >>> sure why the new versions work better. > >>> I favour the version with the new C0 calculation, as it avoids > rescaling > >>> the input. > >>> I also played around with a version (not attached) that prescales the > >>> input by rshifts/2 - this might be considering as it simplifies the > >>> code. > >>> > >>> Regards, > >>> > >>> Marcello > >>> > >>> PS: I am using 1.1 but the same issue is present with the latest code > >>> well. > >>> > >>> > >>> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin at jmvalin.ca> wrote: > >>> > >>>> Hi Marcello, > >>>> > >>>> Thanks for the report. It's hard to debug this without the actual > file. > >>>> Can you please post the sweep_in.raw file you used? > >>>> > >>>> Cheers, > >>>> > >>>> Jean-Marc > >>>> > >>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote: > >>>>> Hi, > >>>>> > >>>>> Apologies if this is a known issues, but I have found what I believe > >>>>> is > >>>>> a bug in the fixed point implementation of the Silk codec and could > >>>>> not > >>>>> find any mention on this in the archives. > >>>>> > >>>>> The bug can be easily reproduced with the fixed point demo program > >>>>> (./configure ?enable-fixed-point ?disable-float-api && make) using > the > >>>>> following command: > >>>>> > >>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw sweep_out.raw > >>>>> > >>>>> Where sweep_in.raw is a 30 seconds full scale frequency sweep from 0 > >>>>> to > >>>>> 8kHz sampled at 16kHz. > >>>>> > >>>>> The first 6 seconds of audio after transcoding sound Ok. After that > >>>>> artefacts are introduced all the way to the end of the file. > >>>>> The floating point version does not have the issue (even though the > >>>>> quality is subjectively worse roughly from the same point). > >>>>> > >>>>> I believe I narrowed down the problem to the file burg_modified_FIX.c > >>>>> ? > >>>>> if I make sure the input signal is scaled down to 14 bits before > >>>>> processing the coefficients of the predictor are calculated correctly > >>>>> and no artefact is introduced. > >>>>> > >>>>> Is anyone experiencing the same problem or has a proper fix for this? > >>>>> (I > >>>>> can work around the bug with input scaling for now). > >>>>> > >>>>> Thanks and best regards, > >>>>> > >>>>> Marcello Caramma > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> opus mailing list > >>>>> opus at xiph.org > >>>>> http://lists.xiph.org/mailman/listinfo/opus > >>>>> > >>> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20140620/519738a7/attachment-0001.htm