search for: vmull_s16

Displaying 5 results from an estimated 5 matches for "vmull_s16".

Did you mean: vmovl_s16
2011 Sep 01
0
[PATCH 4/5] configure.ac: Add ARM NEON support
...55c0b4..08d3d5f 100644 --- a/configure.ac +++ b/configure.ac @@ -89,6 +89,23 @@ has_sse=no ) AC_MSG_RESULT($has_sse) +AC_MSG_CHECKING(for NEON in current arch/CFLAGS) +AC_LINK_IFELSE([ +AC_LANG_PROGRAM([[ +#include <arm_neon.h> +int32x4_t testfunc(int16_t *a, int16_t *b) { + return vmull_s16(vld1_s16(a), vld1_s16(b)); +} +]])], +[ +has_neon=yes +], +[ +has_neon=no +] +) +AC_MSG_RESULT($has_neon) + SAVE_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -fvisibility=hidden" AC_MSG_CHECKING(for ELF visibility) @@ -148,6 +165,15 @@ has_sse=no fi ]) +AC_ARG_ENABLE(neon, [ --ena...
2016 Aug 26
2
[PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON
..._t sum_s32x4 = vdupq_n_s32(0); + int64x2_t sum_s64x2; + int64x1_t sum_s64x1; + + for( i = 0; i < len - 7; i += 8 ) { + const int16x8_t in1 = vld1q_s16(&inVec1[i]); + const int16x8_t in2 = vld1q_s16(&inVec2[i]); + int32x4_t t0 = vmull_s16(vget_low_s16 (in1), vget_low_s16 (in2)); + int32x4_t t1 = vmull_s16(vget_high_s16(in1), vget_high_s16(in2)); + t0 = vshlq_s32(t0, scaleLeft_s32x4); + sum_s32x4 = vaddq_s32(sum_s32x4, t0); + t1 = vshlq_s32(t1, scaleLeft_s...
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in
2016 Aug 23
0
[PATCH 8/8] Optimize silk_NSQ_del_dec() for ARM NEON
...tmp1_s16x4 = vbsl_s16( vorr_u16( equalMinus1_u16x4, lessThanMinus1_u16x4 ), vneg_s16( tmp1_s16x4 ), tmp1_s16x4 ); + tmp2_s16x4 = vbsl_s16( lessThanMinus1_u16x4, vneg_s16( tmp2_s16x4 ), tmp2_s16x4 ); + rd1_Q10_s32x4 = vmull_s16( tmp1_s16x4, vdup_n_s16( Lambda_Q10 ) ); + rd2_Q10_s32x4 = vmull_s16( tmp2_s16x4, vdup_n_s16( Lambda_Q10 ) ); + } + + rr_Q10_s16x4 = vsub_s16( r_Q10_s16x4, q1_Q10_s16x4 ); + rd1_Q10_s32x4 = vmlal_s16( rd1_Q10_s32x4, rr_Q10_s16x4, rr_Q10_s16x4 ); +...
2016 Aug 23
2
[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.
NSQ_LPC_BUF_LENGTH is independent of DECISION_DELAY. --- silk/define.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/silk/define.h b/silk/define.h index 781cfdc..1286048 100644 --- a/silk/define.h +++ b/silk/define.h @@ -173,11 +173,7 @@ extern "C" #define MAX_MATRIX_SIZE MAX_LPC_ORDER /* Max of LPC Order and LTP order */ -#if( MAX_LPC_ORDER >