thr3ads.net - opus - [opus] Antw: Re: [OPUS] celt_inner_prod() and dual_inner

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2017-Jun-06 03:43 UTC

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng,

On 05/06/17 03:31 PM, Linfeng Zhang wrote:> Yes we'll have one more patch set related to xcorr in next week. Please
> don't wait if it's too late for 1.2 release.
Assuming there's no issue with the patches, next week isn't too late.

Also, I've started looking at your patches. So far there's one thing
that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
have:

+        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);

Given the normal range of the values (the xy values are often much
larger than one) and the precision involved here (24-bit mantissa), it
seems like this test can only succeed if the two values are actually
equal. Is the float patch actually bit-exact? If so, then maybe you
should be using actual equality. If not, then I guess we need to find
the right condition (which isn't obvious for floating point).

Cheers,

	Jean-Marc

> Thanks,
> Linfeng
> 
> On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com
> <mailto:linfengz at google.com>> wrote:
> 
>     Hi Jean-Marc,
> 
>     I attached the new version in inner_prod_5patches_v2.zip which
>     synced to the current master.
> 
>     For fixed-point ARM, only
>     0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>     changes the performance.
>     For floating-point ARM, only
>     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>     <http://elt_inner_prod-and-dual_inn.pa>tch changes the
performance.
>     Patch 1 and 2 are code clean-up and can only affect x86 performance.
>     Patch 5 has neglectable effect on floating-point ARM performance.
> 
>     Thanks,
>     Linfeng
> 
>     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at
jmvalin.ca
>     <mailto:jmvalin at jmvalin.ca>> wrote:
> 
>         Hi Linfeng,
> 
>         I'll look into your patches. Can you let me know what's the
expected
>         effect on performance (if any) for each of your patches? Also,
>         are these
>         all the patches you intend to merge for 1.2 or are there more
>         upcoming ones?
> 
>         Cheers,
> 
>                 Jean-Marc
> 
>         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>         > Hi,
>         >
>         > Attached are 5 patches related to celt_inner_prod()
>         > and dual_inner_prod() NEON intrinsics optimization.
>         >
>         > In
>         0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>         <http://elt_inner_prod-and-dual_inn.pa>tch, the
>         > optimization changed the order of floating-point inner
>         products, which
>         > will change the results. I
>         > created celt_inner_prod_neon_float_c_simulation()
>         > and dual_inner_prod_neon_float_c_simulation() to simulate the
>         order
>         > floating-point operations in NEON optimization and compare
their
>         > results. Sorry that I cannot bond the distance between
original C
>         > function and NEON function to any giving reasonable small
>         number or
>         > ratio. It's easy to create an input which 0 and 1,000 are
both
>         correct
>         > results by just manipulating the inner product order.
>         >
>         > The total speed gain is about 1.0% for fixed-point encoder,
>         and 1.8% for
>         > floating-point encoder, in Complexity 8, tested on my
Chromebook.
>         >
>         > Thanks,
>         > Linfeng
>         >
>         >
>         > _______________________________________________
>         > opus mailing list
>         > opus at xiph.org <mailto:opus at xiph.org>
>         > http://lists.xiph.org/mailman/listinfo/opus
>         <http://lists.xiph.org/mailman/listinfo/opus>
>         >
> 
> 
>

Linfeng Zhang

2017-Jun-06 04:46 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Jean-Marc,

I tried "==" before, and it failed when both results are 0.0. Maybe
the
exponent or sign has difference because of the different 0.0 representation
in NEON. If anybody know how to handle this 0.0 comparison, that would be
great.
Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try
this.

Thanks,
Linfeng

On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca>
wrote:
> Hi Linfeng,
>
> On 05/06/17 03:31 PM, Linfeng Zhang wrote:
> > Yes we'll have one more patch set related to xcorr in next week.
Please
> > don't wait if it's too late for 1.2 release.
>
> Assuming there's no issue with the patches, next week isn't too
late.
>
> Also, I've started looking at your patches. So far there's one
thing
> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
> have:
>
> +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
>
> Given the normal range of the values (the xy values are often much
> larger than one) and the precision involved here (24-bit mantissa), it
> seems like this test can only succeed if the two values are actually
> equal. Is the float patch actually bit-exact? If so, then maybe you
> should be using actual equality. If not, then I guess we need to find
> the right condition (which isn't obvious for floating point).
>
> Cheers,
>
>         Jean-Marc
>
>
> > Thanks,
> > Linfeng
> >
> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at
google.com
> > <mailto:linfengz at google.com>> wrote:
> >
> >     Hi Jean-Marc,
> >
> >     I attached the new version in inner_prod_5patches_v2.zip which
> >     synced to the current master.
> >
> >     For fixed-point ARM, only
> >     0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
> >     changes the performance.
> >     For floating-point ARM, only
> >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
> >     <http://elt_inner_prod-and-dual_inn.pa>tch changes the
performance.
> >     Patch 1 and 2 are code clean-up and can only affect x86
performance.
> >     Patch 5 has neglectable effect on floating-point ARM performance.
> >
> >     Thanks,
> >     Linfeng
> >
> >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at
jmvalin.ca
> >     <mailto:jmvalin at jmvalin.ca>> wrote:
> >
> >         Hi Linfeng,
> >
> >         I'll look into your patches. Can you let me know
what's the
> expected
> >         effect on performance (if any) for each of your patches? Also,
> >         are these
> >         all the patches you intend to merge for 1.2 or are there more
> >         upcoming ones?
> >
> >         Cheers,
> >
> >                 Jean-Marc
> >
> >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> >         > Hi,
> >         >
> >         > Attached are 5 patches related to celt_inner_prod()
> >         > and dual_inner_prod() NEON intrinsics optimization.
> >         >
> >         > In
> >         0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
> >         <http://elt_inner_prod-and-dual_inn.pa>tch, the
> >         > optimization changed the order of floating-point inner
> >         products, which
> >         > will change the results. I
> >         > created celt_inner_prod_neon_float_c_simulation()
> >         > and dual_inner_prod_neon_float_c_simulation() to simulate
the
> >         order
> >         > floating-point operations in NEON optimization and
compare
> their
> >         > results. Sorry that I cannot bond the distance between
> original C
> >         > function and NEON function to any giving reasonable small
> >         number or
> >         > ratio. It's easy to create an input which 0 and 1,000
are both
> >         correct
> >         > results by just manipulating the inner product order.
> >         >
> >         > The total speed gain is about 1.0% for fixed-point
encoder,
> >         and 1.8% for
> >         > floating-point encoder, in Complexity 8, tested on my
> Chromebook.
> >         >
> >         > Thanks,
> >         > Linfeng
> >         >
> >         >
> >         > _______________________________________________
> >         > opus mailing list
> >         > opus at xiph.org <mailto:opus at xiph.org>
> >         > http://lists.xiph.org/mailman/listinfo/opus
> >         <http://lists.xiph.org/mailman/listinfo/opus>
> >         >
> >
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20170606/27e5ca49/attachment.html>

Jean-Marc Valin

2017-Jun-06 05:52 UTC

head link

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

As far as I know, +0 should be equal to -0 in C. And even then, I don't
see a reason two identical pieces of code should give different results
on an IEEE 754-compliant platform (which I believe Neon is). Can you
check what exactly is the case that doesn't match?

Cheers,

	Jean-Marc

On 06/06/17 12:46 AM, Linfeng Zhang wrote:> Hi Jean-Marc,
> 
> I tried "==" before, and it failed when both results are 0.0.
Maybe the
> exponent or sign has difference because of the different 0.0
> representation in NEON. If anybody know how to handle this 0.0
> comparison, that would be great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't
try this.
> 
> Thanks,
> Linfeng
> 
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca
> <mailto:jmvalin at jmvalin.ca>> wrote:
> 
>     Hi Linfeng,
> 
>     On 05/06/17 03:31 PM, Linfeng Zhang wrote:
>     > Yes we'll have one more patch set related to xcorr in next
week.
>     Please
>     > don't wait if it's too late for 1.2 release.
> 
>     Assuming there's no issue with the patches, next week isn't too
late.
> 
>     Also, I've started looking at your patches. So far there's one
thing
>     that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
>     have:
> 
>     +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
> 
>     Given the normal range of the values (the xy values are often much
>     larger than one) and the precision involved here (24-bit mantissa), it
>     seems like this test can only succeed if the two values are actually
>     equal. Is the float patch actually bit-exact? If so, then maybe you
>     should be using actual equality. If not, then I guess we need to find
>     the right condition (which isn't obvious for floating point).
> 
>     Cheers,
> 
>             Jean-Marc
> 
> 
>     > Thanks,
>     > Linfeng
>     >
>     > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang
>     <linfengz at google.com <mailto:linfengz at google.com>
>     > <mailto:linfengz at google.com <mailto:linfengz at
google.com>>> wrote:
>     >
>     >     Hi Jean-Marc,
>     >
>     >     I attached the new version in inner_prod_5patches_v2.zip which
>     >     synced to the current master.
>     >
>     >     For fixed-point ARM, only
>     >    
0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>     >     changes the performance.
>     >     For floating-point ARM, only
>     >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>     >     <http://elt_inner_prod-and-dual_inn.pa>tch changes the
>     performance.
>     >     Patch 1 and 2 are code clean-up and can only affect x86
>     performance.
>     >     Patch 5 has neglectable effect on floating-point ARM
performance.
>     >
>     >     Thanks,
>     >     Linfeng
>     >
>     >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin
>     <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>
>     >     <mailto:jmvalin at jmvalin.ca <mailto:jmvalin at
jmvalin.ca>>> wrote:
>     >
>     >         Hi Linfeng,
>     >
>     >         I'll look into your patches. Can you let me know
what's
>     the expected
>     >         effect on performance (if any) for each of your patches?
Also,
>     >         are these
>     >         all the patches you intend to merge for 1.2 or are there
more
>     >         upcoming ones?
>     >
>     >         Cheers,
>     >
>     >                 Jean-Marc
>     >
>     >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>     >         > Hi,
>     >         >
>     >         > Attached are 5 patches related to celt_inner_prod()
>     >         > and dual_inner_prod() NEON intrinsics optimization.
>     >         >
>     >         > In
>     >        
0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>     >         <http://elt_inner_prod-and-dual_inn.pa>tch, the
>     >         > optimization changed the order of floating-point
inner
>     >         products, which
>     >         > will change the results. I
>     >         > created celt_inner_prod_neon_float_c_simulation()
>     >         > and dual_inner_prod_neon_float_c_simulation() to
>     simulate the
>     >         order
>     >         > floating-point operations in NEON optimization and
>     compare their
>     >         > results. Sorry that I cannot bond the distance
between
>     original C
>     >         > function and NEON function to any giving reasonable
small
>     >         number or
>     >         > ratio. It's easy to create an input which 0 and
1,000
>     are both
>     >         correct
>     >         > results by just manipulating the inner product order.
>     >         >
>     >         > The total speed gain is about 1.0% for fixed-point
encoder,
>     >         and 1.8% for
>     >         > floating-point encoder, in Complexity 8, tested on my
>     Chromebook.
>     >         >
>     >         > Thanks,
>     >         > Linfeng
>     >         >
>     >         >
>     >         > _______________________________________________
>     >         > opus mailing list
>     >         > opus at xiph.org <mailto:opus at xiph.org>
>     <mailto:opus at xiph.org <mailto:opus at xiph.org>>
>     >         > http://lists.xiph.org/mailman/listinfo/opus
>     >         <http://lists.xiph.org/mailman/listinfo/opus>
>     >         >
>     >
>     >
>     >
>

Ulrich Windl

2017-Jun-06 07:03 UTC

head link

[opus] Antw: Re: [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017
um 06:46 in Nachricht<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at
mail.gmail.com>:> Hi Jean-Marc,
> 
> I tried "==" before, and it failed when both results are 0.0.
Maybe the
> exponent or sign has difference because of the different 0.0 representation
> in NEON. If anybody know how to handle this 0.0 comparison, that would be
> great.
> Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't
try this.
>From some faint memory of my math lessions I had produced code like this to
get the smallest floating-point number different from zero:
double  EPS;            /* smallest number not equal to 0.0 */

/* refined estimate of EPS */
static  double  get_EPS(double eps)
{

        while ( 1.0 + eps != 1.0 )
                eps /= 2;
        return(eps);
}

EPS = get_EPS(1.0);

On the x86_64 platform I get:
(gdb) p EPS
$1 = 1.1102230246251565e-16

Maybe it can help...

Regards,
Ulrich
> 
> Thanks,
> Linfeng
> 
> On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at
jmvalin.ca> wrote:
> 
>> Hi Linfeng,
>>
>> On 05/06/17 03:31 PM, Linfeng Zhang wrote:
>> > Yes we'll have one more patch set related to xcorr in next
week. Please
>> > don't wait if it's too late for 1.2 release.
>>
>> Assuming there's no issue with the patches, next week isn't too
late.
>>
>> Also, I've started looking at your patches. So far there's one
thing
>> that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you
>> have:
>>
>> +        celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL);
>>
>> Given the normal range of the values (the xy values are often much
>> larger than one) and the precision involved here (24-bit mantissa), it
>> seems like this test can only succeed if the two values are actually
>> equal. Is the float patch actually bit-exact? If so, then maybe you
>> should be using actual equality. If not, then I guess we need to find
>> the right condition (which isn't obvious for floating point).
>>
>> Cheers,
>>
>>         Jean-Marc
>>
>>
>> > Thanks,
>> > Linfeng
>> >
>> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at
google.com
>> > <mailto:linfengz at google.com>> wrote:
>> >
>> >     Hi Jean-Marc,
>> >
>> >     I attached the new version in inner_prod_5patches_v2.zip which
>> >     synced to the current master.
>> >
>> >     For fixed-point ARM, only
>> >    
0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch
>> >     changes the performance.
>> >     For floating-point ARM, only
>> >     0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> >     <http://elt_inner_prod-and-dual_inn.pa>tch changes the
performance.
>> >     Patch 1 and 2 are code clean-up and can only affect x86
performance.
>> >     Patch 5 has neglectable effect on floating-point ARM
performance.
>> >
>> >     Thanks,
>> >     Linfeng
>> >
>> >     On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin
at jmvalin.ca
>> >     <mailto:jmvalin at jmvalin.ca>> wrote:
>> >
>> >         Hi Linfeng,
>> >
>> >         I'll look into your patches. Can you let me know
what's the
>> expected
>> >         effect on performance (if any) for each of your patches?
Also,
>> >         are these
>> >         all the patches you intend to merge for 1.2 or are there
more
>> >         upcoming ones?
>> >
>> >         Cheers,
>> >
>> >                 Jean-Marc
>> >
>> >         On 01/06/17 06:33 PM, Linfeng Zhang wrote:
>> >         > Hi,
>> >         >
>> >         > Attached are 5 patches related to celt_inner_prod()
>> >         > and dual_inner_prod() NEON intrinsics optimization.
>> >         >
>> >         > In
>> >        
0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa
>> >         <http://elt_inner_prod-and-dual_inn.pa>tch, the
>> >         > optimization changed the order of floating-point
inner
>> >         products, which
>> >         > will change the results. I
>> >         > created celt_inner_prod_neon_float_c_simulation()
>> >         > and dual_inner_prod_neon_float_c_simulation() to
simulate the
>> >         order
>> >         > floating-point operations in NEON optimization and
compare
>> their
>> >         > results. Sorry that I cannot bond the distance
between
>> original C
>> >         > function and NEON function to any giving reasonable
small
>> >         number or
>> >         > ratio. It's easy to create an input which 0 and
1,000 are both
>> >         correct
>> >         > results by just manipulating the inner product order.
>> >         >
>> >         > The total speed gain is about 1.0% for fixed-point
encoder,
>> >         and 1.8% for
>> >         > floating-point encoder, in Complexity 8, tested on my
>> Chromebook.
>> >         >
>> >         > Thanks,
>> >         > Linfeng
>> >         >
>> >         >
>> >         > _______________________________________________
>> >         > opus mailing list
>> >         > opus at xiph.org <mailto:opus at xiph.org>
>> >         > http://lists.xiph.org/mailman/listinfo/opus 
>> >         <http://lists.xiph.org/mailman/listinfo/opus>
>> >         >
>> >
>> >
>> >
>>

Reasonably Related Threads

Search for more possibly parallel threads

opus - Jun 2017 - Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

[opus] Antw: Re: [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics

Reasonably Related Threads