Thank Jean-Mark and Jonathan! I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my Chromebook. Probably it's because many NEON intrinsics optimizations can benefit both fixed-point and floating-point encoder. So if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit. Thanks, Linfeng On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:> > On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi, > > ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf > --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 > --disable-shared > > When configuring with floating-point and intrinsics enabled as above, the > generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), > with > /* #undef OPUS_ARM_ASM */ > /* #undef OPUS_ARM_INLINE_ASM */ > /* #undef OPUS_ARM_INLINE_EDSP */ > /* #undef OPUS_ARM_INLINE_MEDIA */ > /* #undef OPUS_ARM_INLINE_NEON */ > /* #undef OPUS_ARM_MAY_HAVE_EDSP */ > /* #undef OPUS_ARM_MAY_HAVE_MEDIA */ > /* #undef OPUS_ARM_MAY_HAVE_NEON */ > /* #undef OPUS_ARM_PRESUME_AARCH64_NEON_INTR */ > /* #undef OPUS_ARM_PRESUME_EDSP */ > /* #undef OPUS_ARM_PRESUME_MEDIA */ > /* #undef OPUS_ARM_PRESUME_NEON */ > /* #undef OPUS_ARM_PRESUME_NEON_INTR */ > > So MAY_HAVE_NEON will be defined to MEDIA version, which will eventually > fall down to C functions in the jump table: > # define MAY_HAVE_NEON(name) MAY_HAVE_MEDIA(name) > > Therefore all NEON intrinsics optimizations in their jump tables won't get > called for floating-point. > > Am I missing some options in my configure command, or the config is intend > to do so in floating-point? > > Thanks, > Linfeng > > > The structure of this is pretty tangled and confusing, but what you’ll > find is that the MAY_HAVE_NEON macro isn’t used in the jump tables for the > two Neon intrinsics functions (silk_NSQ_noise_shape_feedback_loop_neon > and celt_pitch_xcorr_float_neon) which are used in a floating-point neon > build. See silk/arm/arm_silk_map.c and celt/arm/arm_celt_map.c. > > So long as OPUS_ARM_MAY_HAVE_NEON_INTR and OPUS_HAVE_RTCD are set in > config.h, it’ll pick up those functions, and check for them using RTCD. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/7ec9a5c6/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: enable_arm_floating_intrinsics.patch Type: text/x-patch Size: 962 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/7ec9a5c6/attachment-0001.bin>
Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler supports, and the CPU may support, Neon assembly code, which isn’t necessarily the same thing as the compiler supporting Neon intrinsics. (The Visual Studio ARM compiler, for instance, supports intrinsics but not assembly.) So I don’t think this patch is the right solution. Instead, I think the problem is actually that silk/arm/arm_silk_map.c uses the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were changed so that the jump tables just listed the _neon versions of the functions directly, you’d get the speedup you’re looking for. On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Thank Jean-Mark and Jonathan! I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my Chromebook. Probably it's because many NEON intrinsics optimizations can benefit both fixed-point and floating-point encoder. So if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit. Thanks, Linfeng On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as above, the generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with /* #undef OPUS_ARM_ASM */ /* #undef OPUS_ARM_INLINE_ASM */ /* #undef OPUS_ARM_INLINE_EDSP */ /* #undef OPUS_ARM_INLINE_MEDIA */ /* #undef OPUS_ARM_INLINE_NEON */ /* #undef OPUS_ARM_MAY_HAVE_EDSP */ /* #undef OPUS_ARM_MAY_HAVE_MEDIA */ /* #undef OPUS_ARM_MAY_HAVE_NEON */ /* #undef OPUS_ARM_PRESUME_AARCH64_NEON_INTR */ /* #undef OPUS_ARM_PRESUME_EDSP */ /* #undef OPUS_ARM_PRESUME_MEDIA */ /* #undef OPUS_ARM_PRESUME_NEON */ /* #undef OPUS_ARM_PRESUME_NEON_INTR */ So MAY_HAVE_NEON will be defined to MEDIA version, which will eventually fall down to C functions in the jump table: # define MAY_HAVE_NEON(name) MAY_HAVE_MEDIA(name) Therefore all NEON intrinsics optimizations in their jump tables won't get called for floating-point. Am I missing some options in my configure command, or the config is intend to do so in floating-point? Thanks, Linfeng The structure of this is pretty tangled and confusing, but what you’ll find is that the MAY_HAVE_NEON macro isn’t used in the jump tables for the two Neon intrinsics functions (silk_NSQ_noise_shape_feedback_loop_neon and celt_pitch_xcorr_float_neon) which are used in a floating-point neon build. See silk/arm/arm_silk_map.c and celt/arm/arm_celt_map.c. So long as OPUS_ARM_MAY_HAVE_NEON_INTR and OPUS_HAVE_RTCD are set in config.h, it’ll pick up those functions, and check for them using RTCD. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/00a8e7a7/attachment.html>
Thank Jonathan! I'll fix the MAY_HAVE_NEON() in silk/arm/arm_silk_map.c Linfeng On Thu, Jun 1, 2017 at 3:34 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:> Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler > supports, and the CPU may support, Neon assembly code, which isn’t > necessarily the same thing as the compiler supporting Neon intrinsics. > (The Visual Studio ARM compiler, for instance, supports intrinsics but not > assembly.) So I don’t think this patch is the right solution. > > Instead, I think the problem is actually that silk/arm/arm_silk_map.c uses > the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were > changed so that the jump tables just listed the _neon versions of the > functions directly, you’d get the speedup you’re looking for. > > > On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Thank Jean-Mark and Jonathan! > > I tested current OPUS encoder in floating-point with Complexity 8. Hacking > using the attached patch (which will generate "#define > OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my > Chromebook. Probably it's because many NEON intrinsics optimizations can > benefit both fixed-point and floating-point encoder. > > So if it's safe enough to enable MAY_HAVE_NEON in floating-point by > default, it could speed up floating-point NEON encoder a little bit. > > Thanks, > Linfeng > > On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> > wrote: > >> >> On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: >> >> Hi, >> >> ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf >> --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 >> --disable-shared >> >> When configuring with floating-point and intrinsics enabled as above, the >> generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), >> with >> /* #undef OPUS_ARM_ASM */ >> /* #undef OPUS_ARM_INLINE_ASM */ >> /* #undef OPUS_ARM_INLINE_EDSP */ >> /* #undef OPUS_ARM_INLINE_MEDIA */ >> /* #undef OPUS_ARM_INLINE_NEON */ >> /* #undef OPUS_ARM_MAY_HAVE_EDSP */ >> /* #undef OPUS_ARM_MAY_HAVE_MEDIA */ >> /* #undef OPUS_ARM_MAY_HAVE_NEON */ >> /* #undef OPUS_ARM_PRESUME_AARCH64_NEON_INTR */ >> /* #undef OPUS_ARM_PRESUME_EDSP */ >> /* #undef OPUS_ARM_PRESUME_MEDIA */ >> /* #undef OPUS_ARM_PRESUME_NEON */ >> /* #undef OPUS_ARM_PRESUME_NEON_INTR */ >> >> So MAY_HAVE_NEON will be defined to MEDIA version, which will eventually >> fall down to C functions in the jump table: >> # define MAY_HAVE_NEON(name) MAY_HAVE_MEDIA(name) >> >> Therefore all NEON intrinsics optimizations in their jump tables won't >> get called for floating-point. >> >> Am I missing some options in my configure command, or the config is >> intend to do so in floating-point? >> >> Thanks, >> Linfeng >> >> >> The structure of this is pretty tangled and confusing, but what you’ll >> find is that the MAY_HAVE_NEON macro isn’t used in the jump tables for the >> two Neon intrinsics functions (silk_NSQ_noise_shape_feedback_loop_neon >> and celt_pitch_xcorr_float_neon) which are used in a floating-point neon >> build. See silk/arm/arm_silk_map.c and celt/arm/arm_celt_map.c. >> >> So long as OPUS_ARM_MAY_HAVE_NEON_INTR and OPUS_HAVE_RTCD are set in >> config.h, it’ll pick up those functions, and check for them using RTCD. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/2be9ea3a/attachment-0001.html>