search for: celt_pitch_xcorr_arm

Displaying 20 results from an estimated 36 matches for "celt_pitch_xcorr_arm".

2014 Mar 10
2
Building Opus (git master) ARM assembly for iOS
...Specifically, if I configure with: ../configure -C CC="xcrun --sdk iphoneos clang -arch armv7" --build=x86_64-apple-darwin13.1.0 --host=arm-apple-darwin11 --enable-fixed-point I get: $ make V=1 /Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive depbase=`echo celt/arm/celt_pitch_xcorr_arm-gnu.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\ /bin/sh ./libtool --mode=compile xcrun --sdk iphoneos clang -arch armv7 -DHAVE_CONFIG_H -I. -I.. -I../include -I../celt -I../silk -I../silk/float -I../silk/fixed -g -O2 -MT celt/arm/celt_pitch_xcorr_arm-gnu.lo -MD -MP -MF $depbase.Tpo...
2014 Feb 07
3
[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions
This is required in order to build using the built-in assembler in clang. --- celt/arm/celt_pitch_xcorr_arm.s | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 09917b1..3c4b950 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_pr...
2014 Feb 08
3
[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions
On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.
2014 Oct 15
0
Errors when compiling for ARM Cortex-M4
Hi, I had the following errors when compiling the library for Cortex-m4 (-mcpu=cortex-m4) using the GNU compiler for ARM (gcc-arm-none-eabi-4_8). CPPAS celt/arm/celt_pitch_xcorr_arm-gnu.locelt/arm/celt_pitch_xcorr_arm-gnu.S: Assembler messages:celt/arm/celt_pitch_xcorr_arm-gnu.S:299: Error: thumb conditional instruction should be in IT block -- `ldrgt r12,[r4],#4'celt/arm/celt_pitch_xcorr_arm-gnu.S:317: Error: thumb conditional instruction should be in IT block -- `ldrhgt...
2014 Feb 08
0
[PATCH v2] arm: Use the UAL syntax for instructions
This is required in order to build using the built-in assembler in clang. --- I squashed the two changes since it would break the normal gcc build otherwise. --- celt/arm/arm2gnu.pl | 2 ++ celt/arm/celt_pitch_xcorr_arm.s | 18 +++++++++--------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/celt/arm/arm2gnu.pl b/celt/arm/arm2gnu.pl index eab42ef..5c24758 100755 --- a/celt/arm/arm2gnu.pl +++ b/celt/arm/arm2gnu.pl @@ -25,6 +25,8 @@ $n=0; $thumb = 0; # ARM mode by default, not Thumb. @proc_st...
2014 Mar 19
3
[PATCH 1/2] Add separate labels for the start of public functions
This avoids having to use the public symbol name when jumping here, on platforms where the public symbols have an underscore prefix. --- This avoids having to add heuristics for adding prefixes to symbols in jumps to local labels as well. --- celt/arm/celt_pitch_xcorr_arm.s | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/celt/arm/celt_pitch_xcorr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 598e45b..f96e0a8 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -42,6 +42,7 @@ IF OPUS_ARM_MAY_HAVE_NEON ; Co...
2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...e that it is measurably better > than the simple loop. But I would sincerely prefer it this way.. It is > fairly straight forward unroll in my opinion. It makes sense to optimize for executed cycles or readability/code size, but this doesn't really do either. You'll see the code in celt_pitch_xcorr_arm.s processes the last samples in groups of 2, 1, 1. It costs one extra compare (only one, because keep in mind the switch has an extra compare since gcc is not smart enough to prove that len==0 is impossible, despite our assert), but avoids an indirect jump. That's all probably irrelevant, a...
2014 Feb 13
1
[PATCH v2] arm: Use the UAL syntax for instructions
On Sat, 8 Feb 2014, Martin Storsjo wrote: > This is required in order to build using the built-in assembler > in clang. > --- > I squashed the two changes since it would break the normal gcc > build otherwise. > --- > celt/arm/arm2gnu.pl | 2 ++ > celt/arm/celt_pitch_xcorr_arm.s | 18 +++++++++--------- > 2 files changed, 11 insertions(+), 9 deletions(-) Ping, any further comments on this one? The place in arm2gnu.pl where ".syntax unified" is added could probably be changed to some better place if there's suggestions, but this works at least. // Mar...
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...2(xi++); > + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); > + YY[0]= vld1q_f32(yi++); > + len--; > + } > + > + if (len > 0) { > + XX_2 = vld1_dup_f32(xi); > + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); > + } Hi Timothy, After analyzing celt_pitch_xcorr_arm.s, the closest I came using intrinsics is below code.. which didn't really put much dent in the performance.. so I just left it out since above code submitted is much simpler to read than below celt_pitch_xcorr_arm.s.. So, I request to leave it simple to read for now. float32x2_t YY_2; whil...
2014 Dec 11
2
[ARM][FFT][NEON] Integrate Ne10 into Opus?
...s? I am not familiar with configure script, but I find "Optinal Packages" in it. If we provides --with-ne10-fft option, the one extra thing that users need to do is to indicate where libne10 is, right? Is there any 3rd package already enabled in Opus? Or can we follow the pattern where celt_pitch_xcorr_arm is optimized? We can include a few source files from Ne10, put them under celt/arm directory. P.S. Sorry to send again and again. It turned out I had not registered in to list. Best Regards, Phil Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: http:/...
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote: > Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the ‘arch’ parameter can just be ignored > by the assembly functions, but at least the comments in that file > should be updated to indicate the register that’s used to pass it in, > and that it’s ignored. > > 2. In the 0003- patch, you shouldn’t u...
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...>> than the simple loop. But I would sincerely prefer it this way.. It is >> fairly straight forward unroll in my opinion. > > It makes sense to optimize for executed cycles or readability/code size, > but this doesn't really do either. > > You'll see the code in celt_pitch_xcorr_arm.s processes the last samples > in groups of 2, 1, 1. It costs one extra compare (only one, because keep > in mind the switch has an extra compare since gcc is not smart enough to > prove that len==0 is impossible, despite our assert), but avoids an > indirect jump. That's all probab...
2014 Sep 05
2
Opus decoding performance on ARM devices
Hi, Thank you for your response. I pulled yesterday to commit da97db1ca1f92592af3534c9a2596da0e9a009ca, added a bunch of more defines to my compile options, and assembled & linked in armopts.s,celt_pitch_xcorr_arm.s. Performance jumped up from about 4.8 Mb/s to 5.3 Mb/s on the same device, so it is improvement. Not sure what other tweaks there would be to try, but if it could match the tremolo decoder, we could probably throw that out entirely which would be very nice. Thanks! Dan On 04/09/14 19:40, &quot...
2014 Dec 01
0
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...= vld1q_f32(yi); > > If len == 4, then in the first iteration you will have loaded 8 y > values, but only 7 are guaranteed to be available (e.g., the C code only > references y[0] up to y[len-1+3]). You need to end this loop early and > fall back to another approach. See comments in celt_pitch_xcorr_arm.s > for details and an example (there are other useful comments there that > could shave another cycle or two from this inner loop). Analyzed the implementation in celt_pitch_xcorr_arm.s. I will re-do my implementation to follow same algorithm. It seems more elegant. This comment applies to r...
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for ARM NEON floating point. Changes from RFCv3: - celt_neon_intr.c - removed warnings due to not having constant pointers - Put simpler loop to take care of corner cases. Unrolling using intrinsics was not really mapping well to what was done in celt_pitch_xcorr_arm.s - Makefile.am Removed explicit -O3 optimization - test_unit_mathops.c, test_unit_rotation.c followed recommendation to use #if #elif to guarantee that only one of "arm/arm_celt_map.c" or "x86/x86_celt_map.c" is included Viswanath Puttagunta (1): armv7: cel...
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more in-line with algorithm in celt_pitch_xcorr_arm.s Viswanath Puttagunta (1): armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics Makefile.am | 11 ++ celt/arm/arm_celt_map.c | 15 ++- celt/arm/celt_neon_intr.c | 242 +++++++++++++++++++++++++++++++++++++++ celt/arm/pitch_arm.h | 13 ++-...
2014 Nov 28
2
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...i += 4; > + YY[4] = vld1q_f32(yi); If len == 4, then in the first iteration you will have loaded 8 y values, but only 7 are guaranteed to be available (e.g., the C code only references y[0] up to y[len-1+3]). You need to end this loop early and fall back to another approach. See comments in celt_pitch_xcorr_arm.s for details and an example (there are other useful comments there that could shave another cycle or two from this inner loop). > + YY[1] = vextq_f32(YY[0], YY[4], 1); > + YY[2] = vextq_f32(YY[0], YY[4], 2); > + YY[3] = vextq_f32(YY[0], YY[4], 3); > + > + XX[0] = vld1q_dup_f3...
2017 Jun 01
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi, Attached are 5 patches related to celt_inner_prod() and dual_inner_prod() NEON intrinsics optimization. In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the optimization changed the order of floating-point inner products, which will change the results. I created celt_inner_prod_neon_float_c_simulation() and dual_inner_prod_neon_float_c_simulation() to simulate the order
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
..._kernel_neon_float(const float *x, const float *y, > + float *sum, int len) { I had to think quite a bit about what "3to1" meant (since it is describing the context of the caller, not what the actual function does). I'd follow the naming convention in the existing celt_pitch_xcorr_arm.s, and use "process1", personally. > + int i; > + float32x4_t XX[4]; > + float32x4_t YY[4]; > + float32x4_t SUMM; > + float32x2_t ZERO; > + float32x2x2_t tv; > + float sumi; > + float *xi = x; > + float *yi = y; > + > + ZERO = vdup_n_f...
2014 Dec 07
0
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...on top of commit > aad281878: Fix celt_pitch_xcorr_c signature. > which got rid of ugly code around CELT_PITCH_XCORR_IMPL > passing of "arch" parameter. > - Unified with --enable-intrinsics used by x86 > - Modified algorithm to be more in-line with algorithm in > celt_pitch_xcorr_arm.s > > Viswanath Puttagunta (1): > armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics > > Makefile.am | 11 ++ > celt/arm/arm_celt_map.c | 15 ++- > celt/arm/celt_neon_intr.c | 242 +++++++++++++++++++++++++++++++++++++++ > celt/a...