Marcello Caramma (mcaramma)
2014-Feb-21 22:54 UTC
[opus] Make check failure on clone from 31 January
I tracked down the bug to an incorrect use of restrict. I would not consider this a compiler bug: we are lying to the optimizer by telling it that a pointer is restrict when in fact it isn't. This can be fixed like so: diff --git a/celt/mdct.c b/celt/mdct.c index 1634e8e..fa5098c 100644 --- a/celt/mdct.c +++ b/celt/mdct.c @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar *in, kiss_fft_scala /* Post-rotate and de-shuffle from both ends of the buffer at once to make it in-place. */ { - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; + kiss_fft_scalar * yp0 = out+(overlap>>1); + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2; const kiss_twiddle_scalar *t = &trig[0]; /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the middle pair will be computed twice. */ Or like so: diff --git a/celt/mdct.c b/celt/mdct.c index 1634e8e..cdd053f 100644 --- a/celt/mdct.c +++ b/celt/mdct.c @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar *in, kiss_fft_scala kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; const kiss_twiddle_scalar *t = &trig[0]; - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the - middle pair will be computed twice. */ - for(i=0;i<(N4+1)>>1;i++) + /* Loop to N4>>1 to make sure pointers never overlap */ + for(i=0;i<N4>>1;i++) { kiss_fft_scalar re, im, yr, yi; kiss_twiddle_scalar t0, t1; @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar *in, kiss_fft_scala yp0 += 2; yp1 -= 2; } + // Handle the odd case + if(N4 & 1) + { + kiss_fft_scalar re, im; + kiss_twiddle_scalar t0, t1; + /* We swap real and imag because we're using an FFT instead of an IFFT. */ + re = yp0[1]; + im = yp0[0]; + t0 = t[i]; + t1 = t[N4+i]; + /* We'd scale up by 2 here, but instead it's done when mixing the windows */ + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1); + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0); + } } /* Mirror on both sides for TDAC */ Regards, Marcello On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com> wrote:>On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma) ><mcaramma at cisco.com> wrote: >> Hi, >> >> Apologies if this is a known issue, but running make on revision >>e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when >>using fixed point configuration (floating point is ok) on my linux x86. >> Note that libopus1.1, as extracted from the tar ball, is OK. > >I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat >4.8.2-7) (GCC) compiled with >CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean ; >make celt/tests/test_unit_mdct > >or with Clang 3.5 on the same system. > >None of the fixed point builds on or ci system are throwing errors on >this either. > >Can you try compiling without optimizations? Smells like a compiler bug.
Good catch! I think gcc is just unable to make use of restrict for anything useful. Apparently your compiler does, so could you benchmark both patches below and see if it makes a difference in terms of speed. If it's worth it, I can keep the restrict and add the odd case handling, but if not I'll just remove the restrict. Thanks, Jean-Marc On 21/02/14 05:54 PM, Marcello Caramma (mcaramma) wrote:> I tracked down the bug to an incorrect use of restrict. > > I would not consider this a compiler bug: we are lying to the optimizer by > telling it that a pointer is restrict when in fact it isn't. > > This can be fixed like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..fa5098c 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > /* Post-rotate and de-shuffle from both ends of the buffer at once to > make > it in-place. */ > { > - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > + kiss_fft_scalar * yp0 = out+(overlap>>1); > + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > middle pair will be computed twice. */ > > > Or like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..cdd053f 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > - middle pair will be computed twice. */ > - for(i=0;i<(N4+1)>>1;i++) > + /* Loop to N4>>1 to make sure pointers never overlap */ > + for(i=0;i<N4>>1;i++) > { > kiss_fft_scalar re, im, yr, yi; > kiss_twiddle_scalar t0, t1; > @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > yp0 += 2; > yp1 -= 2; > } > + // Handle the odd case > + if(N4 & 1) > + { > + kiss_fft_scalar re, im; > + kiss_twiddle_scalar t0, t1; > + /* We swap real and imag because we're using an FFT instead of > an IFFT. */ > + re = yp0[1]; > + im = yp0[0]; > + t0 = t[i]; > + t1 = t[N4+i]; > + /* We'd scale up by 2 here, but instead it's done when mixing > the windows */ > + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1); > + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0); > + } > } > > /* Mirror on both sides for TDAC */ > > > Regards, > > Marcello > > > > On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com> wrote: > >> On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma) >> <mcaramma at cisco.com> wrote: >>> Hi, >>> >>> Apologies if this is a known issue, but running make on revision >>> e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when >>> using fixed point configuration (floating point is ok) on my linux x86. >>> Note that libopus1.1, as extracted from the tar ball, is OK. >> >> I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat >> 4.8.2-7) (GCC) compiled with >> CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean ; >> make celt/tests/test_unit_mdct >> >> or with Clang 3.5 on the same system. >> >> None of the fixed point builds on or ci system are throwing errors on >> this either. >> >> Can you try compiling without optimizations? Smells like a compiler bug. > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
Marcello Caramma (mcaramma)
2014-Feb-24 16:03 UTC
[opus] Make check failure on clone from 31 January
After a few experiments, I found that both alternatives are very similar, and 2~5% slower compared to the following: diff --git a/celt/mdct.c b/celt/mdct.c index 1634e8e..e490c3b 100644 --- a/celt/mdct.c +++ b/celt/mdct.c @@ -277,7 +277,7 @@ void clt_mdct_backward(const mdct_lookup *l, kiss_fft_scalar *in, kiss_fft_scala it in-place. */ { kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; + kiss_fft_scalar * yp1 = yp0+N2-2; const kiss_twiddle_scalar *t = &trig[0]; /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the middle pair will be computed twice. */ This should always be safe because by making yp1 directly derived from yp0 the compiler should be able to determine that the addresses could overlap. I am not sure why this is faster than removing restrict from both, though. Mysteries of gcc I guess... I also tried replacing yp1[k] with yp0[k+N2-2-4*i] with k={1,2}, but that seemed to make no measurable difference. Also, replacing the loop with a first loop (to N4) for the rotation and a second loop (to N4>>1) for the shuffle made no measurable impact with my compiler (i.e. same as the result after the last suggested patch). It might still be worth considering, as I personally found that easier to read (it also avoids calculating the central value twice for odd N4). Marcello ________________________________________ From: Jean-Marc Valin [jmvalin at jmvalin.ca] Sent: 22 February 2014 15:56 To: Marcello Caramma (mcaramma); opus at xiph.org Subject: Re: [opus] Make check failure on clone from 31 January Good catch! I think gcc is just unable to make use of restrict for anything useful. Apparently your compiler does, so could you benchmark both patches below and see if it makes a difference in terms of speed. If it's worth it, I can keep the restrict and add the odd case handling, but if not I'll just remove the restrict. Thanks, Jean-Marc On 21/02/14 05:54 PM, Marcello Caramma (mcaramma) wrote:> I tracked down the bug to an incorrect use of restrict. > > I would not consider this a compiler bug: we are lying to the optimizer by > telling it that a pointer is restrict when in fact it isn't. > > This can be fixed like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..fa5098c 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -276,8 +276,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > /* Post-rotate and de-shuffle from both ends of the buffer at once to > make > it in-place. */ > { > - kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > - kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > + kiss_fft_scalar * yp0 = out+(overlap>>1); > + kiss_fft_scalar * yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > middle pair will be computed twice. */ > > > Or like so: > > > diff --git a/celt/mdct.c b/celt/mdct.c > index 1634e8e..cdd053f 100644 > --- a/celt/mdct.c > +++ b/celt/mdct.c > @@ -279,9 +279,8 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > kiss_fft_scalar * OPUS_RESTRICT yp0 = out+(overlap>>1); > kiss_fft_scalar * OPUS_RESTRICT yp1 = out+(overlap>>1)+N2-2; > const kiss_twiddle_scalar *t = &trig[0]; > - /* Loop to (N4+1)>>1 to handle odd N4. When N4 is odd, the > - middle pair will be computed twice. */ > - for(i=0;i<(N4+1)>>1;i++) > + /* Loop to N4>>1 to make sure pointers never overlap */ > + for(i=0;i<N4>>1;i++) > { > kiss_fft_scalar re, im, yr, yi; > kiss_twiddle_scalar t0, t1; > @@ -309,6 +308,20 @@ void clt_mdct_backward(const mdct_lookup *l, > kiss_fft_scalar *in, kiss_fft_scala > yp0 += 2; > yp1 -= 2; > } > + // Handle the odd case > + if(N4 & 1) > + { > + kiss_fft_scalar re, im; > + kiss_twiddle_scalar t0, t1; > + /* We swap real and imag because we're using an FFT instead of > an IFFT. */ > + re = yp0[1]; > + im = yp0[0]; > + t0 = t[i]; > + t1 = t[N4+i]; > + /* We'd scale up by 2 here, but instead it's done when mixing > the windows */ > + yp0[0] = S_MUL(re,t0) + S_MUL(im,t1); > + yp0[1] = S_MUL(re,t1) - S_MUL(im,t0); > + } > } > > /* Mirror on both sides for TDAC */ > > > Regards, > > Marcello > > > > On 05/02/2014 18:46, "Gregory Maxwell" <gmaxwell at gmail.com> wrote: > >> On Wed, Feb 5, 2014 at 8:05 AM, Marcello Caramma (mcaramma) >> <mcaramma at cisco.com> wrote: >>> Hi, >>> >>> Apologies if this is a known issue, but running make on revision >>> e3187444692195957eb66989622c7b1ad8448b06 fails one of the tests when >>> using fixed point configuration (floating point is ok) on my linux x86. >>> Note that libopus1.1, as extracted from the tar ball, is OK. >> >> I can't reproduce with Fedora 19, gcc version 4.8.2 20131212 (Red Hat >> 4.8.2-7) (GCC) compiled with >> CFLAGS='-m32 -O2 -g' ./configure --enable-fixed-point ; make clean ; >> make celt/tests/test_unit_mdct >> >> or with Clang 3.5 on the same system. >> >> None of the fixed point builds on or ci system are throwing errors on >> this either. >> >> Can you try compiling without optimizations? Smells like a compiler bug. > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
Apparently Analagous Threads
- Make check failure on clone from 31 January
- Make check failure on clone from 31 January
- [RFC PATCH v1] armv7(float): Optimize decode usecase using NE10 library
- [RFC PATCH v1 2/8] armv7(float): Optimize decode usecase using NE10 library
- Make check failure on clone from 31 January