Right, there shouldn't be a problem with undefined behavior.
That said, a 64 bit implementation will work very well - in fact that's how
it was done originally.
The reason for the current implementation is to minimize 64-bit operations
in order to improve performance on limited-width architectures. This
functions gets used extensively, and I think the current implementation is
faster on a 32 or 16 bit processor. If you would find the opposite to be
true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU)
then perhaps we should reconsider.
Btw, thanks for your help tracking down that bug!
koen.
On Fri, Jun 20, 2014 at 9:46 AM, Jean-Marc Valin <jmvalin at jmvalin.ca>
wrote:
> Hi Marcello,
>
> Actually, we were careful to avoid the undefined behaviour here. In
> fact, we are specifically running a clang test detecting undefined
> behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it
> is based on silk_ADD32_ovflw(), which is defined as:
>
> #define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) +
> (opus_uint32)(b)))
>
> By casting to unsigned, all the cases are well defined. The cast back to
> int is implementation-defined but we are already assuming two's
> complement behaviour every time we are shifting.
>
> As for using the 64-bit version anyway, I'm not sure of the impact
since
> I'm not the original author of the code. Koen, what are the advantages
> of silk_sum_sqr_shift() over the 64-bit version? See any issue with
> using one over the other?
>
> Cheers,
>
> Jean-Marc
>
> On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:
> > Hi Jean-Marc,
> >
> > well spotted! The patch provided fixes the issue for me.
> >
> > Nevertheless, in my code (and I would suggest doing the same in
libopus)
> I
> > am going to replace the function with one that accumulates on 64 bits
and
> > then calculates the shift, for at least 4 reasons:
> > - It is less and simpler code
> > - The result is likely to be slightly more accurate in case big
numbers
> > come early in the array.
> > - I don?t like to have branching inside loops
> >
> > - Even though it seems to work on most compilers, assuming that
overflow
> > in the accumulator will lead to a negative result calls for undefined
> > behaviour (NOT implementation defined): if left as is the code should
at
> > least use an unsigned accumulator and replace the test (nrg < 0)
with
> (nrg
> >> MAX_INT32).
> >
> > Can you see any issue in doing so?
> > If you also think it might be a good idea to do the same in the
libopus
> > repository, I can send you the suggested patch.
> >
> > Cheers,
> >
> > Marcello
> >
> >
> > On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at
jmvalin.ca> wrote:
> >
> >> Hi Marcello,
> >>
> >> It turns out that the problem has a much simpler explanation. As
far as
> >> I can tell, it's actually a bug in silk_sum_sqr_shift() and
this trivial
> >> patch appears to fix the problem:
> >> http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch
> >>
> >> It would still require some testing to check that the fix
doesn't have
> >> any bad side effect. Let me know how well the fix works for you.
Again,
> >> thanks for investigating this.
> >>
> >> Cheers,
> >>
> >> Jean-Marc
> >>
> >> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote:
> >>> Hi Jean Marc,
> >>>
> >>> please find attached the audio file (mono 16khz). I shortened
it to
> >>> about
> >>> 10 seconds. I also add 2 patches that worked for me. Further
info that
> >>> might help:
> >>>
> >>> - The problem seems to be related to silk_burg_modified not
reaching
> the
> >>> maximum gain, so the actual filter order is 16 rather than 2
(which is
> >>> what would be expected with a sine wave).
> >>> - The problem seems to happen when rshifts >= 3
> >>> - when pre-scaling the signal to be < 16384 the problem
goes away
> (patch
> >>> scale_burg_in.diff)
> >>> - When calculating C0 and rshifts based on a 64 bits
correlation
> instead
> >>> of using silk_sum_sqr_shift the problem also goes away (patch
> >>> new_C0_calc.diff)
> >>>
> >>> I suspect that for very high prediction gain the fixed point
> >>> implementation becomes very sensitive to numerical error, but
I am not
> >>> too
> >>> sure why the new versions work better.
> >>> I favour the version with the new C0 calculation, as it avoids
> rescaling
> >>> the input.
> >>> I also played around with a version (not attached) that
prescales the
> >>> input by rshifts/2 - this might be considering as it
simplifies the
> >>> code.
> >>>
> >>> Regards,
> >>>
> >>> Marcello
> >>>
> >>> PS: I am using 1.1 but the same issue is present with the
latest code
> >>> well.
> >>>
> >>>
> >>> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin
at jmvalin.ca> wrote:
> >>>
> >>>> Hi Marcello,
> >>>>
> >>>> Thanks for the report. It's hard to debug this without
the actual
> file.
> >>>> Can you please post the sweep_in.raw file you used?
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Jean-Marc
> >>>>
> >>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Apologies if this is a known issues, but I have found
what I believe
> >>>>> is
> >>>>> a bug in the fixed point implementation of the Silk
codec and could
> >>>>> not
> >>>>> find any mention on this in the archives.
> >>>>>
> >>>>> The bug can be easily reproduced with the fixed point
demo program
> >>>>> (./configure ?enable-fixed-point ?disable-float-api
&& make) using
> the
> >>>>> following command:
> >>>>>
> >>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw
sweep_out.raw
> >>>>>
> >>>>> Where sweep_in.raw is a 30 seconds full scale
frequency sweep from 0
> >>>>> to
> >>>>> 8kHz sampled at 16kHz.
> >>>>>
> >>>>> The first 6 seconds of audio after transcoding sound
Ok. After that
> >>>>> artefacts are introduced all the way to the end of the
file.
> >>>>> The floating point version does not have the issue
(even though the
> >>>>> quality is subjectively worse roughly from the same
point).
> >>>>>
> >>>>> I believe I narrowed down the problem to the file
burg_modified_FIX.c
> >>>>> ?
> >>>>> if I make sure the input signal is scaled down to 14
bits before
> >>>>> processing the coefficients of the predictor are
calculated correctly
> >>>>> and no artefact is introduced.
> >>>>>
> >>>>> Is anyone experiencing the same problem or has a
proper fix for this?
> >>>>> (I
> >>>>> can work around the bug with input scaling for now).
> >>>>>
> >>>>> Thanks and best regards,
> >>>>>
> >>>>> Marcello Caramma
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> opus mailing list
> >>>>> opus at xiph.org
> >>>>> http://lists.xiph.org/mailman/listinfo/opus
> >>>>>
> >>>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/opus/attachments/20140620/519738a7/attachment-0001.htm