thr3ads.net - opus - [opus] Alleged bug in Silk codec [Jun 2014]

If this information is useful, please help other people find it:
Share via:

KOEN VOS

2014-Jun-20 18:29 UTC

[opus] Alleged bug in Silk codec

Right, there shouldn't be a problem with undefined behavior.

That said, a 64 bit implementation will work very well - in fact that's how
it was done originally.

The reason for the current implementation is to minimize 64-bit operations
in order to improve performance on limited-width architectures.  This
functions gets used extensively, and I think the current implementation is
faster on a 32 or 16 bit processor.  If you would find the opposite to be
true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU)
then perhaps we should reconsider.

Btw, thanks for your help tracking down that bug!
koen.


On Fri, Jun 20, 2014 at 9:46 AM, Jean-Marc Valin <jmvalin at jmvalin.ca>
wrote:
> Hi Marcello,
>
> Actually, we were careful to avoid the undefined behaviour here. In
> fact, we are specifically running a clang test detecting undefined
> behaviour. If you look at the silk_SMLABB_ovflw() macro, you will see it
> is based on silk_ADD32_ovflw(), which is defined as:
>
> #define silk_ADD32_ovflw(a, b) ((opus_int32)((opus_uint32)(a) +
> (opus_uint32)(b)))
>
> By casting to unsigned, all the cases are well defined. The cast back to
> int is implementation-defined but we are already assuming two's
> complement behaviour every time we are shifting.
>
> As for using the 64-bit version anyway, I'm not sure of the impact
since
> I'm not the original author of the code. Koen, what are the advantages
> of silk_sum_sqr_shift() over the 64-bit version? See any issue with
> using one over the other?
>
> Cheers,
>
>         Jean-Marc
>
> On 20/06/14 06:10 AM, Marcello Caramma (mcaramma) wrote:
> > Hi Jean-Marc,
> >
> > well spotted! The patch provided fixes the issue for me.
> >
> > Nevertheless, in my code (and I would suggest doing the same in
libopus)
> I
> > am going to replace the function with one that accumulates on 64 bits
and
> > then calculates the shift, for at least 4 reasons:
> >  - It is less and simpler code
> >  - The result is likely to be slightly more accurate in case big
numbers
> > come early in the array.
> >  - I don?t like to have branching inside loops
> >
> >  - Even though it seems to work on most compilers, assuming that
overflow
> > in the accumulator will lead to a negative result calls for undefined
> > behaviour (NOT implementation defined): if left as is the code should
at
> > least use an unsigned accumulator and replace the test (nrg < 0)
with
> (nrg
> >> MAX_INT32).
> >
> > Can you see any issue in doing so?
> > If you also think it might be a good idea to do the same in the
libopus
> > repository, I can send you the suggested patch.
> >
> > Cheers,
> >
> > Marcello
> >
> >
> > On 18/06/2014 08:09, "Jean-Marc Valin" <jmvalin at
jmvalin.ca> wrote:
> >
> >> Hi Marcello,
> >>
> >> It turns out that the problem has a much simpler explanation. As
far as
> >> I can tell, it's actually a bug in silk_sum_sqr_shift() and
this trivial
> >> patch appears to fix the problem:
> >> http://jmvalin.ca/misc_stuff/sum_sqr_shift_fix.patch
> >>
> >> It would still require some testing to check that the fix
doesn't have
> >> any bad side effect. Let me know how well the fix works for you.
Again,
> >> thanks for investigating this.
> >>
> >> Cheers,
> >>
> >>      Jean-Marc
> >>
> >> On 13/06/14 12:28 PM, Marcello Caramma (mcaramma) wrote:
> >>> Hi Jean Marc,
> >>>
> >>> please find attached the audio file (mono 16khz). I shortened
it to
> >>> about
> >>> 10 seconds. I also add 2 patches that worked for me. Further
info that
> >>> might help:
> >>>
> >>> - The problem seems to be related to silk_burg_modified not
reaching
> the
> >>> maximum gain, so the actual filter order is 16 rather than 2
(which is
> >>> what would be expected with a sine wave).
> >>> - The problem seems to happen when rshifts >= 3
> >>> - when pre-scaling the signal to be < 16384 the problem
goes away
> (patch
> >>> scale_burg_in.diff)
> >>> - When calculating C0 and rshifts based on a 64 bits
correlation
> instead
> >>> of using silk_sum_sqr_shift the problem also goes away (patch
> >>> new_C0_calc.diff)
> >>>
> >>> I suspect that for very high prediction gain the fixed point
> >>> implementation becomes very sensitive to numerical error, but
I am not
> >>> too
> >>> sure why the new versions work better.
> >>> I favour the version with the new C0 calculation, as it avoids
> rescaling
> >>> the input.
> >>> I also played around with a version (not attached) that
prescales the
> >>> input by rshifts/2 - this might be considering as it
simplifies the
> >>> code.
> >>>
> >>> Regards,
> >>>
> >>> Marcello
> >>>
> >>> PS: I am using 1.1 but the same issue is present with the
latest code
> >>> well.
> >>>
> >>>
> >>> On 13/06/2014 06:05, "Jean-Marc Valin" <jmvalin
at jmvalin.ca> wrote:
> >>>
> >>>> Hi Marcello,
> >>>>
> >>>> Thanks for the report. It's hard to debug this without
the actual
> file.
> >>>> Can you please post the sweep_in.raw file you used?
> >>>>
> >>>> Cheers,
> >>>>
> >>>>    Jean-Marc
> >>>>
> >>>> On 11/06/14 04:46 AM, Marcello Caramma (mcaramma) wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Apologies if this is a known issues, but I have found
what I believe
> >>>>> is
> >>>>> a bug in the fixed point implementation of the Silk
codec and could
> >>>>> not
> >>>>> find any mention on this in the archives.
> >>>>>
> >>>>> The bug can be easily reproduced with the fixed point
demo program
> >>>>> (./configure ?enable-fixed-point ?disable-float-api
&& make) using
> the
> >>>>> following command:
> >>>>>
> >>>>> ./opus_demo voip 16000 1 23000 sweep_in.raw
sweep_out.raw
> >>>>>
> >>>>> Where sweep_in.raw is a 30 seconds full scale
frequency sweep from 0
> >>>>> to
> >>>>> 8kHz sampled at 16kHz.
> >>>>>
> >>>>> The first 6 seconds of audio after transcoding sound
Ok. After that
> >>>>> artefacts are introduced all the way to the end of the
file.
> >>>>> The floating point version does not have the issue
(even though the
> >>>>> quality is subjectively worse roughly from the same
point).
> >>>>>
> >>>>> I believe I narrowed down the problem to the file
burg_modified_FIX.c
> >>>>> ?
> >>>>> if I make sure the input signal is scaled down to 14
bits before
> >>>>> processing the coefficients of the predictor are
calculated correctly
> >>>>> and no artefact is introduced.
> >>>>>
> >>>>> Is anyone experiencing the same problem or has a
proper fix for this?
> >>>>> (I
> >>>>> can work around the bug with input scaling for now).
> >>>>>
> >>>>> Thanks and best regards,
> >>>>>
> >>>>> Marcello Caramma
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> opus mailing list
> >>>>> opus at xiph.org
> >>>>> http://lists.xiph.org/mailman/listinfo/opus
> >>>>>
> >>>
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/opus/attachments/20140620/519738a7/attachment-0001.htm

Timothy B. Terriberry

2014-Jun-20 19:40 UTC

head link

[opus] Alleged bug in Silk codec

KOEN VOS wrote:> current implementation is faster on a 32 or 16 bit processor.  If you
> would find the opposite to be true (ie that a 64 bit implementation is
> faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.
Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit 
accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 
64-bit accumulator (SMLALBB). I would think a 64-bit version could be 
made pretty fast on 32-bit ARM, without even resorting to SIMD.

KOEN VOS

2014-Jun-20 21:04 UTC

head link

[opus] Alleged bug in Silk codec

Yes those instructions exist, although they're a bit slower than the basic
16x16->32 with 32-bit accumulation (SMLABB).  So I'd be surprised if the
function with 64 bit accumulation would run as fast as the current code.
Don't know how much we care about 16-bit platforms.  And accuracy should
not matter.

On the other hand, a 64-bit implementation is much cleaner/shorter, which
is always a good argument :-)
All in all, no strong preference from my side.
koen.

On Fri, Jun 20, 2014 at 12:40 PM, Timothy B. Terriberry <tterribe at
xiph.org>
wrote:
> KOEN VOS wrote:
>
>> current implementation is faster on a 32 or 16 bit processor.  If you
>> would find the opposite to be true (ie that a 64 bit implementation is
>> faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.
>>
>
> Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit
> accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a
64-bit
> accumulator (SMLALBB). I would think a 64-bit version could be made pretty
> fast on 32-bit ARM, without even resorting to SIMD.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/opus/attachments/20140620/6fc87274/attachment.htm

Maybe Matching Threads

Search for more maybe matching threads

opus - Jun 2014 - Alleged bug in Silk codec

[opus] Alleged bug in Silk codec

[opus] Alleged bug in Silk codec

[opus] Alleged bug in Silk codec

Maybe Matching Threads