search for: xcorr_kernel_neon

Displaying 8 results from an estimated 8 matches for "xcorr_kernel_neon".

2014 Mar 10
2
Building Opus (git master) ARM assembly for iOS
...ign 2; .arch armv7-a ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:31:4: error: unknown directive .fpu neon ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:32:4: error: unknown directive .object_arch armv4t ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:47:2: error: unknown directive .type xcorr_kernel_neon, %function; xcorr_kernel_neon: @ PROC ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:155:2: error: unknown directive .size xcorr_kernel_neon, .-xcorr_kernel_neon @ ENDP ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:159:2: error: unknown directive .type celt_pitch_xcorr_neon, %function; celt_pitch_xcorr_neon: @...
2014 Mar 19
3
[PATCH 1/2] Add separate labels for the start of public functions
...letions(-) diff --git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 598e45b..f96e0a8 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -42,6 +42,7 @@ IF OPUS_ARM_MAY_HAVE_NEON ; Compute sum[k]=sum(x[j]*y[j+k],j=0...len-1), k=0...3 xcorr_kernel_neon PROC +xcorr_kernel_neon_start ; input: ; r3 = int len ; r4 = opus_val16 *x @@ -181,7 +182,7 @@ celt_pitch_xcorr_neon_process4 VEOR q0, q0, q0 ; xcorr_kernel_neon only modifies r4, r5, r12, and q0...q3. ; So we don't save/restore any other registers....
2016 Sep 28
2
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
...of the code to test), optimizing xcorr_kernel gives almost as much speed-up as intrinsics for all of celt_fir: celt_fir_c, xcorr_kernel_c: 1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860] celt_fir_c, xcorr_kernel_neon: 1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710 1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810] celt_fir_neon: 1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690 1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720...
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...ily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; int32x4_t xsum1 = vld1q_s32(sum); int32x4_t xsum2 = vdupq_n_s32(0); for (j = 0; j < len-3; j += 4) { int16x4_t x0 = vld1_s16(x+j); int16x4_t y0 = vld1_s16(y+j); int16x4...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned assembly is bound to be faster than using intrinsics. However I notice that his code can also read past the y buffer. Cheers, --John On 6/6/2013 9:22 PM, Jean-Marc Valin wrote: > Hi John, > > Thanks for the two fixes. They're in git now. Your SSE version seems to > also be slightly faster than
2013 Jun 10
0
opus Digest, Vol 53, Issue 2
...easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; int32x4_t xsum1 = vld1q_s32(sum); int32x4_t xsum2 = vdupq_n_s32(0); for (j = 0; j < len-3; j += 4) { int16x4_t x0 = vld1_s16(x+j); int16x4_t y0 = vld1_s16(y+j); int16x4_...
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang