thr3ads.net - search: "xcorr_kernel

Displaying 8 results from an estimated 8 matches for "xcorr_kernel_neon".

Building Opus (git master) ARM assembly for iOS

2014 Mar 10

Building Opus (git master) ARM assembly for iOS

...ign 2; .arch armv7-a ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:31:4: error: unknown directive .fpu neon ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:32:4: error: unknown directive .object_arch armv4t ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:47:2: error: unknown directive .type xcorr_kernel_neon, %function; xcorr_kernel_neon: @ PROC ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:155:2: error: unknown directive .size xcorr_kernel_neon, .-xcorr_kernel_neon @ ENDP ^ celt/arm/celt_pitch_xcorr_arm-gnu.S:159:2: error: unknown directive .type celt_pitch_xcorr_neon, %function; celt_pitch_xcorr_neon: @...

[PATCH 1/2] Add separate labels for the start of public functions

2014 Mar 19

[PATCH 1/2] Add separate labels for the start of public functions

...letions(-) diff --git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 598e45b..f96e0a8 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -42,6 +42,7 @@ IF OPUS_ARM_MAY_HAVE_NEON ; Compute sum[k]=sum(x[j]*y[j+k],j=0...len-1), k=0...3 xcorr_kernel_neon PROC +xcorr_kernel_neon_start ; input: ; r3 = int len ; r4 = opus_val16 *x @@ -181,7 +182,7 @@ celt_pitch_xcorr_neon_process4 VEOR q0, q0, q0 ; xcorr_kernel_neon only modifies r4, r5, r12, and q0...q3. ; So we don't save/restore any other registers....

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

2016 Sep 28

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

...of the code to test), optimizing xcorr_kernel gives almost as much speed-up as intrinsics for all of celt_fir: celt_fir_c, xcorr_kernel_c: 1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860] celt_fir_c, xcorr_kernel_neon: 1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710 1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810] celt_fir_neon: 1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690 1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...ily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; int32x4_t xsum1 = vld1q_s32(sum); int32x4_t xsum2 = vdupq_n_s32(0); for (j = 0; j < len-3; j += 4) { int16x4_t x0 = vld1_s16(x+j); int16x4_t y0 = vld1_s16(y+j); int16x4...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

Hi JM, I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned assembly is bound to be faster than using intrinsics. However I notice that his code can also read past the y buffer. Cheers, --John On 6/6/2013 9:22 PM, Jean-Marc Valin wrote: > Hi John, > > Thanks for the two fixes. They're in git now. Your SSE version seems to > also be slightly faster than

opus Digest, Vol 53, Issue 2

2013 Jun 10

opus Digest, Vol 53, Issue 2

...easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; int32x4_t xsum1 = vld1q_s32(sum); int32x4_t xsum2 = vdupq_n_s32(0); for (j = 0; j < len-3; j += 4) { int16x4_t x0 = vld1_s16(x+j); int16x4_t y0 = vld1_s16(y+j); int16x4_...

ARM NEON optimization -- celt_fir()

2016 Jun 17

ARM NEON optimization -- celt_fir()

Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang

Several patches of ARM NEON optimization

2016 Jul 14

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

search for: xcorr_kernel_neon