Displaying 8 results from an estimated 8 matches for "xcorr_kernel_neon".
2014 Mar 10
2
Building Opus (git master) ARM assembly for iOS
...ign 2; .arch armv7-a
^
celt/arm/celt_pitch_xcorr_arm-gnu.S:31:4: error: unknown directive
.fpu neon
^
celt/arm/celt_pitch_xcorr_arm-gnu.S:32:4: error: unknown directive
.object_arch armv4t
^
celt/arm/celt_pitch_xcorr_arm-gnu.S:47:2: error: unknown directive
.type xcorr_kernel_neon, %function; xcorr_kernel_neon: @ PROC
^
celt/arm/celt_pitch_xcorr_arm-gnu.S:155:2: error: unknown directive
.size xcorr_kernel_neon, .-xcorr_kernel_neon @ ENDP
^
celt/arm/celt_pitch_xcorr_arm-gnu.S:159:2: error: unknown directive
.type celt_pitch_xcorr_neon, %function; celt_pitch_xcorr_neon: @...
2014 Mar 19
3
[PATCH 1/2] Add separate labels for the start of public functions
...letions(-)
diff --git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s
index 598e45b..f96e0a8 100644
--- a/celt/arm/celt_pitch_xcorr_arm.s
+++ b/celt/arm/celt_pitch_xcorr_arm.s
@@ -42,6 +42,7 @@ IF OPUS_ARM_MAY_HAVE_NEON
; Compute sum[k]=sum(x[j]*y[j+k],j=0...len-1), k=0...3
xcorr_kernel_neon PROC
+xcorr_kernel_neon_start
; input:
; r3 = int len
; r4 = opus_val16 *x
@@ -181,7 +182,7 @@ celt_pitch_xcorr_neon_process4
VEOR q0, q0, q0
; xcorr_kernel_neon only modifies r4, r5, r12, and q0...q3.
; So we don't save/restore any other registers....
2016 Sep 28
2
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
...of the code to test), optimizing
xcorr_kernel gives almost as much speed-up as intrinsics for all of
celt_fir:
celt_fir_c, xcorr_kernel_c:
1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750
1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860]
celt_fir_c, xcorr_kernel_neon:
1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710
1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810]
celt_fir_neon:
1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690
1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720...
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...ily profile ARM code,
so I really can't tell which method is faster (though I suspect Mr.
Zanelli's code is). Let me offer up another intrinsic version of the
NEON xcorr_kernel that is almost identical to the SSE version, and more
in line with Mr. Zanelli's code:
static inline void xcorr_kernel_neon(const opus_val16 *x, const
opus_val16 *y, opus_val32 sum[4], int len)
{
int j;
int32x4_t xsum1 = vld1q_s32(sum);
int32x4_t xsum2 = vdupq_n_s32(0);
for (j = 0; j < len-3; j += 4) {
int16x4_t x0 = vld1_s16(x+j);
int16x4_t y0 = vld1_s16(y+j);
int16x4...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM,
I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned
assembly is bound to be faster than using intrinsics. However I notice
that his code can also read past the y buffer.
Cheers,
--John
On 6/6/2013 9:22 PM, Jean-Marc Valin wrote:
> Hi John,
>
> Thanks for the two fixes. They're in git now. Your SSE version seems to
> also be slightly faster than
2013 Jun 10
0
opus Digest, Vol 53, Issue 2
...easily profile ARM code,
so I really can't tell which method is faster (though I suspect Mr.
Zanelli's code is). Let me offer up another intrinsic version of the
NEON xcorr_kernel that is almost identical to the SSE version, and more
in line with Mr. Zanelli's code:
static inline void xcorr_kernel_neon(const opus_val16 *x, const
opus_val16 *y, opus_val32 sum[4], int len)
{
int j;
int32x4_t xsum1 = vld1q_s32(sum);
int32x4_t xsum2 = vdupq_n_s32(0);
for (j = 0; j < len-3; j += 4) {
int16x4_t x0 = vld1_s16(x+j);
int16x4_t y0 = vld1_s16(y+j);
int16x4_...
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all,
This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
next few months.
I'm submitting 2 patches in the following couple of emails, which have the new
created celt_fir_neon().
I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are
concerns to this change, please let me know.
Many thanks to your comments.
Linfeng Zhang
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang