Marcello Caramma (mcaramma)
2014-Feb-21 22:54 UTC
[opus] Make check failure on clone from 31 January
I tracked down the bug to an incorrect use of restrict.
I would not consider this a compiler bug: we are lying to the optimizer by
telling it that a pointer is restrict when in fact it isn't.
This can be fixed like so:
diff --git a/celt/mdct.c b/celt/mdct.c
index 1634e8e..fa5098c 100644
--- a/celt/mdct.c
+++ b/celt/mdct.c
@@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l,
kiss_fft_scalar *in, kiss_fft_scala
/* Post-rotate and de-shuffle from both ends of the buffer at once to
make
it in-place. */
{
- kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
- kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
+ kiss_fft_scalar * yp0 = out+(overlap>>1);
+ kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2;
const kiss_twiddle_scalar *t = &trig[0];
/* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
middle pair will be computed twice. */
Or like so:
diff --git a/celt/mdct.c b/celt/mdct.c
index 1634e8e..cdd053f 100644
--- a/celt/mdct.c
+++ b/celt/mdct.c
@@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l,
kiss_fft_scalar *in, kiss_fft_scala
kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
const kiss_twiddle_scalar *t = &trig[0];
- /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
- middle pair will be computed twice. */
- for(i=0;i<(N4+1)>>1;i++)
+ /* Loop to N4>>1 to make sure pointers never overlap */
+ for(i=0;i<N4>>1;i++)
{
kiss_fft_scalar re, im, yr, yi;
kiss_twiddle_scalar t0, t1;
@@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l,
kiss_fft_scalar *in, kiss_fft_scala
yp0 += 2;
yp1 -= 2;
}
+ // Handle the odd case
+ if(N4 & 1)
+ {
+ kiss_fft_scalar re, im;
+ kiss_twiddle_scalar t0, t1;
+ /* We swap real and imag because we're using an FFT instead of
an IFFT. */
+ re = yp0[1];
+ im = yp0[0];
+ t0 = t[i];
+ t1 = t[N4+i];
+ /* We'd scale up by 2 here, but instead it's done when mixing
the windows */
+ yp0[0] = S_MUL(re,t0) + S_MUL(im,t1);
+ yp0[1] = S_MUL(re,t1) - S_MUL(im,t0);
+ }
}
/* Mirror on both sides for TDAC */
Regards,
Marcello
On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com>
wrote:
>On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma)
><mcaramma at cisco.com> wrote:
>> Hi,
>>
>> Apologies if this is a known issue, but running make on revision
>>e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when
>>using fixed point configuration (floating point is ok) on my linux x86.
>> Note that libopus1.1, as extracted from the tar ball, is OK.
>
>I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat
>4.8.2-7) (GCC) compiled with
>CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean
;
>make celt/tests/test_unit_mdct
>
>or with Clang 3.5 on the same system.
>
>None of the fixed point builds on or ci system are throwing errors on
>this either.
>
>Can you try compiling without optimizations? Smells like a compiler bug.
Good catch! I think gcc is just unable to make use of restrict for anything useful. Apparently your compiler does, so could you benchmark both patches below and see if it makes a difference in terms of speed. If it's worth it, I can keep the restrict and add the odd case handling, but if not I'll just remove the restrict. Thanks, Jean-Marc On 21/02/14 05:54 PM, Marcello Caramma (mcaramma) wrote:> I tracked down the bug to an incorrect use of restrict. > > I would not consider this a compiler bug: we are lying to the optimizer by > telling it that a pointer is restrict when in fact it isn't. > > This can be fixed like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..fa5098c 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > /* Post-rotate and de-shuffle from both ends of the buffer at once to > make > it in-place. */ > { > - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > + kiss_fft_scalar * yp0 = out+(overlap>>1); > + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > middle pair will be computed twice. */ > > > Or like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..cdd053f 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > - middle pair will be computed twice. */ > - for(i=0;i<(N4+1)>>1;i++) > + /* Loop to N4>>1 to make sure pointers never overlap */ > + for(i=0;i<N4>>1;i++) > { > kiss_fft_scalar re, im, yr, yi; > kiss_twiddle_scalar t0, t1; > @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > yp0 += 2; > yp1 -= 2; > } > + // Handle the odd case > + if(N4 & 1) > + { > + kiss_fft_scalar re, im; > + kiss_twiddle_scalar t0, t1; > + /* We swap real and imag because we're using an FFT instead of > an IFFT. */ > + re = yp0[1]; > + im = yp0[0]; > + t0 = t[i]; > + t1 = t[N4+i]; > + /* We'd scale up by 2 here, but instead it's done when mixing > the windows */ > + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1); > + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0); > + } > } > > /* Mirror on both sides for TDAC */ > > > Regards, > > Marcello > > > > On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com> wrote: > >> On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma) >> <mcaramma at cisco.com> wrote: >>> Hi, >>> >>> Apologies if this is a known issue, but running make on revision >>> e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when >>> using fixed point configuration (floating point is ok) on my linux x86. >>> Note that libopus1.1, as extracted from the tar ball, is OK. >> >> I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat >> 4.8.2-7) (GCC) compiled with >> CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean ; >> make celt/tests/test_unit_mdct >> >> or with Clang 3.5 on the same system. >> >> None of the fixed point builds on or ci system are throwing errors on >> this either. >> >> Can you try compiling without optimizations? Smells like a compiler bug. > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
Marcello Caramma (mcaramma)
2014-Feb-24 16:03 UTC
[opus] Make check failure on clone from 31 January
After a few experiments, I found that both alternatives are very similar, and
2~5% slower compared to the following:
diff --git a/celt/mdct.c b/celt/mdct.c
index 1634e8e..e490c3b 100644
--- a/celt/mdct.c
+++ b/celt/mdct.c
@@ -277,7 +277,7 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar
*in, kiss_fft_scala
it in-place. */
{
kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
- kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
+ kiss_fft_scalar * yp1 = yp0+N2-2;
const kiss_twiddle_scalar *t = &trig[0];
/* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
middle pair will be computed twice. */
This should always be safe because by making yp1 directly derived from yp0 the
compiler should be able to determine that the addresses could overlap.
I am not sure why this is faster than removing restrict from both, though.
Mysteries of gcc I guess...
I also tried replacing yp1[k] with yp0[k+N2-2-4*i] with k={1,2}, but that seemed
to make no measurable difference.
Also, replacing the loop with a first loop (to N4) for the rotation and a second
loop (to N4>>1) for the shuffle made no measurable impact with my compiler
(i.e. same as the result after the last suggested patch). It might still be
worth considering, as I personally found that easier to read (it also avoids
calculating the central value twice for odd N4).
Marcello
________________________________________
From: Jean-Marc Valin [jmvalin at jmvalin.ca]
Sent: 22 February 2014 15:56
To: Marcello Caramma (mcaramma); opus at xiph.org
Subject: Re: [opus] Make check failure on clone from 31 January
Good catch! I think gcc is just unable to make use of restrict for
anything useful. Apparently your compiler does, so could you benchmark
both patches below and see if it makes a difference in terms of speed.
If it's worth it, I can keep the restrict and add the odd case handling,
but if not I'll just remove the restrict.
Thanks,
Jean-Marc
On 21/02/14 05:54 PM, Marcello Caramma (mcaramma) wrote:> I tracked down the bug to an incorrect use of restrict.
>
> I would not consider this a compiler bug: we are lying to the optimizer by
> telling it that a pointer is restrict when in fact it isn't.
>
> This can be fixed like so:
>
>
> diff --git a/celt/mdct.c b/celt/mdct.c
> index 1634e8e..fa5098c 100644
> --- a/celt/mdct.c
> +++ b/celt/mdct.c
> @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l,
> kiss_fft_scalar *in, kiss_fft_scala
> /* Post-rotate and de-shuffle from both ends of the buffer at once to
> make
> it in-place. */
> {
> - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
> - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
> + kiss_fft_scalar * yp0 = out+(overlap>>1);
> + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2;
> const kiss_twiddle_scalar *t = &trig[0];
> /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
> middle pair will be computed twice. */
>
>
> Or like so:
>
>
> diff --git a/celt/mdct.c b/celt/mdct.c
> index 1634e8e..cdd053f 100644
> --- a/celt/mdct.c
> +++ b/celt/mdct.c
> @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l,
> kiss_fft_scalar *in, kiss_fft_scala
> kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1);
> kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2;
> const kiss_twiddle_scalar *t = &trig[0];
> - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the
> - middle pair will be computed twice. */
> - for(i=0;i<(N4+1)>>1;i++)
> + /* Loop to N4>>1 to make sure pointers never overlap */
> + for(i=0;i<N4>>1;i++)
> {
> kiss_fft_scalar re, im, yr, yi;
> kiss_twiddle_scalar t0, t1;
> @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l,
> kiss_fft_scalar *in, kiss_fft_scala
> yp0 += 2;
> yp1 -= 2;
> }
> + // Handle the odd case
> + if(N4 & 1)
> + {
> + kiss_fft_scalar re, im;
> + kiss_twiddle_scalar t0, t1;
> + /* We swap real and imag because we're using an FFT instead
of
> an IFFT. */
> + re = yp0[1];
> + im = yp0[0];
> + t0 = t[i];
> + t1 = t[N4+i];
> + /* We'd scale up by 2 here, but instead it's done when
mixing
> the windows */
> + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1);
> + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0);
> + }
> }
>
> /* Mirror on both sides for TDAC */
>
>
> Regards,
>
> Marcello
>
>
>
> On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at
gmail.com> wrote:
>
>> On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma)
>> <mcaramma at cisco.com> wrote:
>>> Hi,
>>>
>>> Apologies if this is a known issue, but running make on revision
>>> e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests
when
>>> using fixed point configuration (floating point is ok) on my linux
x86.
>>> Note that libopus1.1, as extracted from the tar ball, is OK.
>>
>> I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red
Hat
>> 4.8.2-7) (GCC) compiled with
>> CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make
clean ;
>> make celt/tests/test_unit_mdct
>>
>> or with Clang 3.5 on the same system.
>>
>> None of the fixed point builds on or ci system are throwing errors on
>> this either.
>>
>> Can you try compiling without optimizations? Smells like a compiler
bug.
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
Apparently Analagous Threads
- Make check failure on clone from 31 January
- Make check failure on clone from 31 January
- [RFC PATCH v1] armv7(float): Optimize decode usecase using NE10 library
- [RFC PATCH v1 2/8] armv7(float): Optimize decode usecase using NE10 library
- Make check failure on clone from 31 January