Jean-Marc Valin
2017-Jun-06 20:15 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote:> Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the ‘arch’ parameter can just be ignored > by the assembly functions, but at least the comments in that file > should be updated to indicate the register that’s used to pass it in, > and that it’s ignored. > > 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in > your new arm_celt_map tables, for the same reason we didn’t want it > in the arm_silk_map tables.I have no further issues with your patches, so once you address the two issues Jonathan pointed out, I'll be able to merge them. Cheers, Jean-Marc> > Out of curiosity, what’s the CPU in the Chromebook you’re using to > test? > >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com> >> wrote: >> >> Hi, >> >> Attached are 5 patches related to celt_inner_prod() and >> dual_inner_prod() NEON intrinsics optimization. >> >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, >> the optimization changed the order of floating-point inner >> products, which will change the results. I created >> celt_inner_prod_neon_float_c_simulation() and >> dual_inner_prod_neon_float_c_simulation() to simulate the order >> floating-point operations in NEON optimization and compare their >> results. Sorry that I cannot bond the distance between original C >> function and NEON function to any giving reasonable small number or >> ratio. It's easy to create an input which 0 and 1,000 are both >> correct results by just manipulating the inner product order. >> >> The total speed gain is about 1.0% for fixed-point encoder, and >> 1.8% for floating-point encoder, in Complexity 8, tested on my >> Chromebook. >> >> Thanks, Linfeng >> <0005-Clean-celt_pitch_xcorr_float_neon.patch><0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch><0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch><0002-Replace-call-of-celt_inner_prod_c-step-2.patch><0001-Replace-call-of-celt_inner_prod_c-step-1.patch>_______________________________________________ >> >>opus mailing list>> opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus > > _______________________________________________ opus mailing list > opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus >
Linfeng Zhang
2017-Jun-06 21:04 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Jonathan and Jean-Marc! I attached the new patch sets in inner_prod_5patches_v3.zip. The Chromebook I'm using is Chromebook 13 CB5-311 series RMN: Z3ENN CPU info: $ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 3 (v7l) BogoMIPS : 2.31 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x3 CPU part : 0xc0f CPU revision : 3 Hardware : NVIDIA Tegra SoC (Flattened Device Tree) Revision : 0000 Serial : 0000000000000000 Thanks, Linfeng On Tue, Jun 6, 2017 at 1:15 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> Hi Linfeng, > > On 06/06/17 04:09 PM, Jonathan Lennox wrote: > > Two comments on the various infrastructure for RTCD etc. > > > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > > correspondingly. I suspect the ‘arch’ parameter can just be ignored > > by the assembly functions, but at least the comments in that file > > should be updated to indicate the register that’s used to pass it in, > > and that it’s ignored. > > > > 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in > > your new arm_celt_map tables, for the same reason we didn’t want it > > in the arm_silk_map tables. > > I have no further issues with your patches, so once you address the two > issues Jonathan pointed out, I'll be able to merge them. > > Cheers, > > Jean-Marc > > > > > Out of curiosity, what’s the CPU in the Chromebook you’re using to > > test? > > > >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com> > >> wrote: > >> > >> Hi, > >> > >> Attached are 5 patches related to celt_inner_prod() and > >> dual_inner_prod() NEON intrinsics optimization. > >> > >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, > >> the optimization changed the order of floating-point inner > >> products, which will change the results. I created > >> celt_inner_prod_neon_float_c_simulation() and > >> dual_inner_prod_neon_float_c_simulation() to simulate the order > >> floating-point operations in NEON optimization and compare their > >> results. Sorry that I cannot bond the distance between original C > >> function and NEON function to any giving reasonable small number or > >> ratio. It's easy to create an input which 0 and 1,000 are both > >> correct results by just manipulating the inner product order. > >> > >> The total speed gain is about 1.0% for fixed-point encoder, and > >> 1.8% for floating-point encoder, in Complexity 8, tested on my > >> Chromebook. > >> > >> Thanks, Linfeng > >> <0005-Clean-celt_pitch_xcorr_float_neon.patch><0004- > Optimize-floating-point-celt_inner_prod-and-dual_inn.patch> > <0003-Optimize-fixed-point-celt_inner_prod-and-dual_ > inner_.patch><0002-Replace-call-of-celt_inner_prod_c- > step-2.patch><0001-Replace-call-of-celt_inner_prod_c- > step-1.patch>_______________________________________________ > >> > >> > opus mailing list > >> opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus > > > > _______________________________________________ opus mailing list > > opus at xiph.org http://lists.xiph.org/mailman/listinfo/opus > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170606/8af9ff04/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: inner_prod_5patches_v3.zip Type: application/zip Size: 11294 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170606/8af9ff04/attachment-0001.zip>
Jean-Marc Valin
2017-Jun-06 21:47 UTC
[opus] [OPUS] celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thanks, all 5 patches merged in master. Jean-Marc On 06/06/17 05:04 PM, Linfeng Zhang wrote:> Thank Jonathan and Jean-Marc! > > I attached the new patch sets in inner_prod_5patches_v3.zip. > > The Chromebook I'm using is > Chromebook 13 > CB5-311 series > RMN: Z3ENN > > CPU info: > > $ cat /proc/cpuinfo > processor: 0 > model name: ARMv7 Processor rev 3 (v7l) > BogoMIPS: 2.31 > Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 > idiva idivt vfpd32 lpae > CPU implementer: 0x41 > CPU architecture: 7 > CPU variant: 0x3 > CPU part: 0xc0f > CPU revision: 3 > > Hardware: NVIDIA Tegra SoC (Flattened Device Tree) > Revision: 0000 > Serial: 0000000000000000 > > Thanks, > Linfeng > > On Tue, Jun 6, 2017 at 1:15 PM, Jean-Marc Valin <jmvalin at jmvalin.ca > <mailto:jmvalin at jmvalin.ca>> wrote: > > Hi Linfeng, > > On 06/06/17 04:09 PM, Jonathan Lennox wrote: > > Two comments on the various infrastructure for RTCD etc. > > > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > > correspondingly. I suspect the ‘arch’ parameter can just be ignored > > by the assembly functions, but at least the comments in that file > > should be updated to indicate the register that’s used to pass it in, > > and that it’s ignored. > > > > 2. In the 0003- patch, you shouldn’t use the MAY_HAVE_NEON macro in > > your new arm_celt_map tables, for the same reason we didn’t want it > > in the arm_silk_map tables. > > I have no further issues with your patches, so once you address the two > issues Jonathan pointed out, I'll be able to merge them. > > Cheers, > > Jean-Marc > > > > > Out of curiosity, what’s the CPU in the Chromebook you’re using to > > test? > > > >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> > >> wrote: > >> > >> Hi, > >> > >> Attached are 5 patches related to celt_inner_prod() and > >> dual_inner_prod() NEON intrinsics optimization. > >> > >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, > >> the optimization changed the order of floating-point inner > >> products, which will change the results. I created > >> celt_inner_prod_neon_float_c_simulation() and > >> dual_inner_prod_neon_float_c_simulation() to simulate the order > >> floating-point operations in NEON optimization and compare their > >> results. Sorry that I cannot bond the distance between original C > >> function and NEON function to any giving reasonable small number or > >> ratio. It's easy to create an input which 0 and 1,000 are both > >> correct results by just manipulating the inner product order. > >> > >> The total speed gain is about 1.0% for fixed-point encoder, and > >> 1.8% for floating-point encoder, in Complexity 8, tested on my > >> Chromebook. > >> > >> Thanks, Linfeng > >> > <0005-Clean-celt_pitch_xcorr_float_neon.patch><0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch><0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch><0002-Replace-call-of-celt_inner_prod_c-step-2.patch><0001-Replace-call-of-celt_inner_prod_c-step-1.patch>_______________________________________________ > >> > >> > opus mailing list > >> opus at xiph.org <mailto:opus at xiph.org> > http://lists.xiph.org/mailman/listinfo/opus > <http://lists.xiph.org/mailman/listinfo/opus> > > > > _______________________________________________ opus mailing list > > opus at xiph.org <mailto:opus at xiph.org> > http://lists.xiph.org/mailman/listinfo/opus > <http://lists.xiph.org/mailman/listinfo/opus> > > > >
Maybe Matching Threads
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics
- celt_inner_prod() and dual_inner_prod() NEON intrinsics