Linfeng Zhang
2017-Jun-05 19:28 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc, I attached the new version in inner_prod_5patches_v2.zip which synced to the current master. For fixed-point ARM, only 0003-Optimize-fixed-point-celt _inner_prod-and-dual_inner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86 performance. Patch 5 has neglectable effect on floating-point ARM performance. Thanks, Linfeng On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> Hi Linfeng, > > I'll look into your patches. Can you let me know what's the expected > effect on performance (if any) for each of your patches? Also, are these > all the patches you intend to merge for 1.2 or are there more upcoming > ones? > > Cheers, > > Jean-Marc > > On 01/06/17 06:33 PM, Linfeng Zhang wrote: > > Hi, > > > > Attached are 5 patches related to celt_inner_prod() > > and dual_inner_prod() NEON intrinsics optimization. > > > > In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the > > optimization changed the order of floating-point inner products, which > > will change the results. I > > created celt_inner_prod_neon_float_c_simulation() > > and dual_inner_prod_neon_float_c_simulation() to simulate the order > > floating-point operations in NEON optimization and compare their > > results. Sorry that I cannot bond the distance between original C > > function and NEON function to any giving reasonable small number or > > ratio. It's easy to create an input which 0 and 1,000 are both correct > > results by just manipulating the inner product order. > > > > The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for > > floating-point encoder, in Complexity 8, tested on my Chromebook. > > > > Thanks, > > Linfeng > > > > > > _______________________________________________ > > opus mailing list > > opus at xiph.org > > http://lists.xiph.org/mailman/listinfo/opus > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/c8d5d402/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: inner_prod_5patches_v2.zip Type: application/zip Size: 10997 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/c8d5d402/attachment-0001.zip>
Linfeng Zhang
2017-Jun-05 19:31 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Yes we'll have one more patch set related to xcorr in next week. Please don't wait if it's too late for 1.2 release. Thanks, Linfeng On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com> wrote:> Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which synced to > the current master. > > For fixed-point ARM, only 0003-Optimize-fixed-point-celt > _inner_prod-and-dual_inner_.patch changes the performance. > For floating-point ARM, only 0004-Optimize-floating-point-c > elt_inner_prod-and-dual_inn.patch changes the performance. > Patch 1 and 2 are code clean-up and can only affect x86 performance. > Patch 5 has neglectable effect on floating-point ARM performance. > > Thanks, > Linfeng > > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> > wrote: > >> Hi Linfeng, >> >> I'll look into your patches. Can you let me know what's the expected >> effect on performance (if any) for each of your patches? Also, are these >> all the patches you intend to merge for 1.2 or are there more upcoming >> ones? >> >> Cheers, >> >> Jean-Marc >> >> On 01/06/17 06:33 PM, Linfeng Zhang wrote: >> > Hi, >> > >> > Attached are 5 patches related to celt_inner_prod() >> > and dual_inner_prod() NEON intrinsics optimization. >> > >> > In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the >> > optimization changed the order of floating-point inner products, which >> > will change the results. I >> > created celt_inner_prod_neon_float_c_simulation() >> > and dual_inner_prod_neon_float_c_simulation() to simulate the order >> > floating-point operations in NEON optimization and compare their >> > results. Sorry that I cannot bond the distance between original C >> > function and NEON function to any giving reasonable small number or >> > ratio. It's easy to create an input which 0 and 1,000 are both correct >> > results by just manipulating the inner product order. >> > >> > The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for >> > floating-point encoder, in Complexity 8, tested on my Chromebook. >> > >> > Thanks, >> > Linfeng >> > >> > >> > _______________________________________________ >> > opus mailing list >> > opus at xiph.org >> > http://lists.xiph.org/mailman/listinfo/opus >> > >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/80e79e73/attachment.html>
Jean-Marc Valin
2017-Jun-05 19:49 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
On 05/06/17 03:28 PM, Linfeng Zhang wrote:> For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes > the performance. > For floating-point ARM, only > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes the performance.Got any numbers? Cheers, Jean-Marc> Patch 1 and 2 are code clean-up and can only affect x86 performance. > Patch 5 has neglectable effect on floating-point ARM performance. > > Thanks, > Linfeng > > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca > <mailto:jmvalin at jmvalin.ca>> wrote: > > Hi Linfeng, > > I'll look into your patches. Can you let me know what's the expected > effect on performance (if any) for each of your patches? Also, are these > all the patches you intend to merge for 1.2 or are there more > upcoming ones? > > Cheers, > > Jean-Marc > > On 01/06/17 06:33 PM, Linfeng Zhang wrote: > > Hi, > > > > Attached are 5 patches related to celt_inner_prod() > > and dual_inner_prod() NEON intrinsics optimization. > > > > In > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the > > optimization changed the order of floating-point inner products, which > > will change the results. I > > created celt_inner_prod_neon_float_c_simulation() > > and dual_inner_prod_neon_float_c_simulation() to simulate the order > > floating-point operations in NEON optimization and compare their > > results. Sorry that I cannot bond the distance between original C > > function and NEON function to any giving reasonable small number or > > ratio. It's easy to create an input which 0 and 1,000 are both correct > > results by just manipulating the inner product order. > > > > The total speed gain is about 1.0% for fixed-point encoder, and > 1.8% for > > floating-point encoder, in Complexity 8, tested on my Chromebook. > > > > Thanks, > > Linfeng > > > > > > _______________________________________________ > > opus mailing list > > opus at xiph.org <mailto:opus at xiph.org> > > http://lists.xiph.org/mailman/listinfo/opus > <http://lists.xiph.org/mailman/listinfo/opus> > > > >
Linfeng Zhang
2017-Jun-05 19:54 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
About 1% speed gain for fixed-point, and 1.5% for floating-point. Thanks, Linfeng On Mon, Jun 5, 2017 at 12:49 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> On 05/06/17 03:28 PM, Linfeng Zhang wrote: > > For fixed-point ARM, only > > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes > > the performance. > > For floating-point ARM, only > > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes > the performance. > > Got any numbers? > > Cheers, > > Jean-Marc > > > Patch 1 and 2 are code clean-up and can only affect x86 performance. > > Patch 5 has neglectable effect on floating-point ARM performance. > > > > Thanks, > > Linfeng > > > > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca > > <mailto:jmvalin at jmvalin.ca>> wrote: > > > > Hi Linfeng, > > > > I'll look into your patches. Can you let me know what's the expected > > effect on performance (if any) for each of your patches? Also, are > these > > all the patches you intend to merge for 1.2 or are there more > > upcoming ones? > > > > Cheers, > > > > Jean-Marc > > > > On 01/06/17 06:33 PM, Linfeng Zhang wrote: > > > Hi, > > > > > > Attached are 5 patches related to celt_inner_prod() > > > and dual_inner_prod() NEON intrinsics optimization. > > > > > > In > > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the > > > optimization changed the order of floating-point inner products, > which > > > will change the results. I > > > created celt_inner_prod_neon_float_c_simulation() > > > and dual_inner_prod_neon_float_c_simulation() to simulate the > order > > > floating-point operations in NEON optimization and compare their > > > results. Sorry that I cannot bond the distance between original C > > > function and NEON function to any giving reasonable small number or > > > ratio. It's easy to create an input which 0 and 1,000 are both > correct > > > results by just manipulating the inner product order. > > > > > > The total speed gain is about 1.0% for fixed-point encoder, and > > 1.8% for > > > floating-point encoder, in Complexity 8, tested on my Chromebook. > > > > > > Thanks, > > > Linfeng > > > > > > > > > _______________________________________________ > > > opus mailing list > > > opus at xiph.org <mailto:opus at xiph.org> > > > http://lists.xiph.org/mailman/listinfo/opus > > <http://lists.xiph.org/mailman/listinfo/opus> > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170605/0670c26d/attachment.html>
Jean-Marc Valin
2017-Jun-06 03:43 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote:> Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release.Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the OPUS_CHECK_ASM section of patch 0004, you have: + celt_assert(ABS32(xy1_c - *xy1) <= VERY_SMALL); Given the normal range of the values (the xy values are often much larger than one) and the precision involved here (24-bit mantissa), it seems like this test can only succeed if the two values are actually equal. Is the float patch actually bit-exact? If so, then maybe you should be using actual equality. If not, then I guess we need to find the right condition (which isn't obvious for floating point). Cheers, Jean-Marc> Thanks, > Linfeng > > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which > synced to the current master. > > For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch > changes the performance. > For floating-point ARM, only > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa > <http://elt_inner_prod-and-dual_inn.pa>tch changes the performance. > Patch 1 and 2 are code clean-up and can only affect x86 performance. > Patch 5 has neglectable effect on floating-point ARM performance. > > Thanks, > Linfeng > > On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca > <mailto:jmvalin at jmvalin.ca>> wrote: > > Hi Linfeng, > > I'll look into your patches. Can you let me know what's the expected > effect on performance (if any) for each of your patches? Also, > are these > all the patches you intend to merge for 1.2 or are there more > upcoming ones? > > Cheers, > > Jean-Marc > > On 01/06/17 06:33 PM, Linfeng Zhang wrote: > > Hi, > > > > Attached are 5 patches related to celt_inner_prod() > > and dual_inner_prod() NEON intrinsics optimization. > > > > In > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.pa > <http://elt_inner_prod-and-dual_inn.pa>tch, the > > optimization changed the order of floating-point inner > products, which > > will change the results. I > > created celt_inner_prod_neon_float_c_simulation() > > and dual_inner_prod_neon_float_c_simulation() to simulate the > order > > floating-point operations in NEON optimization and compare their > > results. Sorry that I cannot bond the distance between original C > > function and NEON function to any giving reasonable small > number or > > ratio. It's easy to create an input which 0 and 1,000 are both > correct > > results by just manipulating the inner product order. > > > > The total speed gain is about 1.0% for fixed-point encoder, > and 1.8% for > > floating-point encoder, in Complexity 8, tested on my Chromebook. > > > > Thanks, > > Linfeng > > > > > > _______________________________________________ > > opus mailing list > > opus at xiph.org <mailto:opus at xiph.org> > > http://lists.xiph.org/mailman/listinfo/opus > <http://lists.xiph.org/mailman/listinfo/opus> > > > > >
Maybe Matching Threads
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics