Thank Jonathan! I'll fix the MAY_HAVE_NEON() in silk/arm/arm_silk_map.c Linfeng On Thu, Jun 1, 2017 at 3:34 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:> Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler > supports, and the CPU may support, Neon assembly code, which isn’t > necessarily the same thing as the compiler supporting Neon intrinsics. > (The Visual Studio ARM compiler, for instance, supports intrinsics but not > assembly.) So I don’t think this patch is the right solution. > > Instead, I think the problem is actually that silk/arm/arm_silk_map.c uses > the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were > changed so that the jump tables just listed the _neon versions of the > functions directly, you’d get the speedup you’re looking for. > > > On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Thank Jean-Mark and Jonathan! > > I tested current OPUS encoder in floating-point with Complexity 8. Hacking > using the attached patch (which will generate "#define > OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my > Chromebook. Probably it's because many NEON intrinsics optimizations can > benefit both fixed-point and floating-point encoder. > > So if it's safe enough to enable MAY_HAVE_NEON in floating-point by > default, it could speed up floating-point NEON encoder a little bit. > > Thanks, > Linfeng > > On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> > wrote: > >> >> On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: >> >> Hi, >> >> ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf >> --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 >> --disable-shared >> >> When configuring with floating-point and intrinsics enabled as above, the >> generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), >> with >> /* #undef OPUS_ARM_ASM */ >> /* #undef OPUS_ARM_INLINE_ASM */ >> /* #undef OPUS_ARM_INLINE_EDSP */ >> /* #undef OPUS_ARM_INLINE_MEDIA */ >> /* #undef OPUS_ARM_INLINE_NEON */ >> /* #undef OPUS_ARM_MAY_HAVE_EDSP */ >> /* #undef OPUS_ARM_MAY_HAVE_MEDIA */ >> /* #undef OPUS_ARM_MAY_HAVE_NEON */ >> /* #undef OPUS_ARM_PRESUME_AARCH64_NEON_INTR */ >> /* #undef OPUS_ARM_PRESUME_EDSP */ >> /* #undef OPUS_ARM_PRESUME_MEDIA */ >> /* #undef OPUS_ARM_PRESUME_NEON */ >> /* #undef OPUS_ARM_PRESUME_NEON_INTR */ >> >> So MAY_HAVE_NEON will be defined to MEDIA version, which will eventually >> fall down to C functions in the jump table: >> # define MAY_HAVE_NEON(name) MAY_HAVE_MEDIA(name) >> >> Therefore all NEON intrinsics optimizations in their jump tables won't >> get called for floating-point. >> >> Am I missing some options in my configure command, or the config is >> intend to do so in floating-point? >> >> Thanks, >> Linfeng >> >> >> The structure of this is pretty tangled and confusing, but what you’ll >> find is that the MAY_HAVE_NEON macro isn’t used in the jump tables for the >> two Neon intrinsics functions (silk_NSQ_noise_shape_feedback_loop_neon >> and celt_pitch_xcorr_float_neon) which are used in a floating-point neon >> build. See silk/arm/arm_silk_map.c and celt/arm/arm_celt_map.c. >> >> So long as OPUS_ARM_MAY_HAVE_NEON_INTR and OPUS_HAVE_RTCD are set in >> config.h, it’ll pick up those functions, and check for them using RTCD. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/2be9ea3a/attachment-0001.html>
Jonathan Lennox
2017-Jun-02 19:53 UTC
[opus] [PATCH] Don't use MAY_HAVE_NEON in arm_silk_map.c.
It's unnecessary, and isn't defined correctly on floating-point. This makes us correctly use Neon functions (in floating-point mode) on platforms where Neon is detected by RTCD. --- silk/arm/arm_silk_map.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/silk/arm/arm_silk_map.c b/silk/arm/arm_silk_map.c index 53a60a0..04767b5 100644 --- a/silk/arm/arm_silk_map.c +++ b/silk/arm/arm_silk_map.c @@ -48,7 +48,7 @@ void (*const SILK_BIQUAD_ALT_STRIDE2_IMPL[OPUS_ARCHMASK + 1])( silk_biquad_alt_stride2_c, /* ARMv4 */ silk_biquad_alt_stride2_c, /* EDSP */ silk_biquad_alt_stride2_c, /* Media */ - MAY_HAVE_NEON(silk_biquad_alt_stride2), /* Neon */ + silk_biquad_alt_stride2_neon, /* Neon */ }; opus_int32 (*const SILK_LPC_INVERSE_PRED_GAIN_IMPL[OPUS_ARCHMASK + 1])( /* O Returns inverse prediction gain in energy domain, Q30 */ @@ -58,7 +58,7 @@ opus_int32 (*const SILK_LPC_INVERSE_PRED_GAIN_IMPL[OPUS_ARCHMASK + 1])( /* O R silk_LPC_inverse_pred_gain_c, /* ARMv4 */ silk_LPC_inverse_pred_gain_c, /* EDSP */ silk_LPC_inverse_pred_gain_c, /* Media */ - MAY_HAVE_NEON(silk_LPC_inverse_pred_gain), /* Neon */ + silk_LPC_inverse_pred_gain_neon, /* Neon */ }; void (*const SILK_NSQ_DEL_DEC_IMPL[OPUS_ARCHMASK + 1])( @@ -81,7 +81,7 @@ void (*const SILK_NSQ_DEL_DEC_IMPL[OPUS_ARCHMASK + 1])( silk_NSQ_del_dec_c, /* ARMv4 */ silk_NSQ_del_dec_c, /* EDSP */ silk_NSQ_del_dec_c, /* Media */ - MAY_HAVE_NEON(silk_NSQ_del_dec), /* Neon */ + silk_NSQ_del_dec_neon, /* Neon */ }; /*There is no table for silk_noise_shape_quantizer_short_prediction because the @@ -115,7 +115,7 @@ void (*const SILK_WARPED_AUTOCORRELATION_FIX_IMPL[OPUS_ARCHMASK + 1])( silk_warped_autocorrelation_FIX_c, /* ARMv4 */ silk_warped_autocorrelation_FIX_c, /* EDSP */ silk_warped_autocorrelation_FIX_c, /* Media */ - MAY_HAVE_NEON(silk_warped_autocorrelation_FIX), /* Neon */ + silk_warped_autocorrelation_FIX_neon, /* Neon */ }; # endif -- 1.9.1
Maybe Matching Threads
- Opus floating-point NEON jump table question
- [PATCH] Optimize silk_LPC_analysis_filter() for ARM NEON
- [PATCH 8/8] Optimize silk_NSQ_del_dec() for ARM NEON
- [PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON
- Opus floating-point NEON jump table question