similar to: [PATCH] 02-Add CELT filter optimizations

Displaying 20 results from an estimated 700 matches similar to: "[PATCH] 02-Add CELT filter optimizations"

2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr -- Aur?lien Zanelli Parrot SA 174, quai de Jemmapes 75010 Paris France -------------- next part -------------- diff --git
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
From: Jyri Sarha <jsarha at ti.com> Semantics of inner_product_single have also been changed to contain the final right shift and saturation so it can also be implemented in the optimal way for the used platform. This change affects fixed point calculations only. I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It does pretty much the same thing as SATURATE32(PSHR32(x,
2013 May 21
0
regarding ARM NEON CELT filter optimizations
Hello Aurelien, + "vdup.s16 d8, %1;\n" //Duplicate num in d8 lane + "vdup.s16 q5, %4;\n" //Duplicate mem in q5 lane + + /* We try to process 16 samples at a time */ + "movs %5, %3, lsr #4;\n" + "beq .celt_fir1_process16_done_%=;\n" + + ".celt_fir1_process16_%=:\n" + /* Load 16 x values in q0, q1 lanes */ +
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote: > Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16,
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hi Pete, We ran into the same issue with generating vector loads/stores for vectors with less than word alignment. It seems we took a similar approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. As you and Jim mentioned, it looks like the vld1/vst1 instructions should support element aligned access for any armv7 implementation (I'm looking at Table A3-1
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the preferable way to
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
> -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Friday, September 07, 2012 10:57 AM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > > On Sep 6, 2012, at 4:40 PM, David Peixotto
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
-----Original Message----- From: Bob Wilson [mailto:bob.wilson at apple.com] Sent: Thursday, September 06, 2012 3:39 PM To: David Peixotto Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Thursday, September 06, 2012 3:39 PM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > Hi Pete, > > We ran into the same issue with generating vector loads/stores for vectors > with less than word alignment. It seems we took a similar approach to > solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the
2020 Mar 31
2
[ARM] Register pressure with -mthumb forces register reload before each call
Hi, Compiling attached test-case, which is reduced version of of uECC_shared_secret from tinycrypt library [1], with --target=arm-linux-gnueabi -march=armv6-m -Oz -S results in reloading of register holding function's address before every call to blx: ldr r3, .LCPI0_0 blx r3 mov r0, r6 mov r1, r5 mov r2, r4 ldr r3,
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took
2020 Apr 15
4
[ARM] Register pressure with -mthumb forces register reload before each call
Hi, I have attached WIP patch for adding foldMemoryOperand to Thumb1InstrInfo. For the following case: void f(int x, int y, int z) { void bar(int, int, int); bar(x, y, z); bar(x, z, y); bar(y, x, z); bar(y, y, x); } it calls foldMemoryOperand twice, and thus converts two calls from blx to bl. callMI->dump() shows the function name "bar" correctly, however in generated
2012 Sep 21
2
[LLVMdev] ARM aapcs calling convention for small vectors
Hi all, I was wondering if ARM aapcs calling convention defines how to pass small vectors as parameter to a routine. By small vectors, I mean with size less than a 32-bit integer. For instance if we consider following code: ; ModuleID = 'smallvect.ll' define arm_aapcscc void @foo(<2 x i8>* %p) { L.entry: %0 = load <2 x i8>* %p call arm_aapcscc void @bar(<2 x
2012 Jun 24
0
[LLVMdev] Request for merge: GHC/ARM calling convention.
Hi Karel, I understand this patch has already been merged (to 3.0), so don't take my question as stopping the merge to head, I'm just making sure I got it right... The rest looks correct. + CCIfType<[v2f64], CCAssignToReg<[Q4, Q5]>>, + CCIfType<[f64], CCAssignToReg<[D8, D9, D10, D11]>>, + CCIfType<[f32], CCAssignToReg<[S16, S17, S18, S19, S20, S21, S22,
2015 Aug 05
0
[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.
--- celt/arm/arm_celt_map.c | 24 +++++++++++- celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++---------------- 2 files changed, 88 insertions(+), 33 deletions(-) diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c index 0c9acff..cc6b706 100644 --- a/celt/arm/arm_celt_map.c +++ b/celt/arm/arm_celt_map.c @@ -94,9 +94,14 @@ void (*const
2013 May 23
2
ASM runtime detection and optimizations
I wrote a proof of concept regarding the cpu capabilities runtime detection and choice of optimized function. I follow design which had been discussed on IRC. Also, i notice a little drawback: we must propagate the arch index through functions which don't have codec state as argument. However, if it's look good, i will continue to implement it. Best regards, -- Aur?lien Zanelli
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with the AVX code and tests for pitch code. Actually, I lied. Because you update opus_select_arch(), you can now return a value for arch (4) that is larger than the maximum we currently support (3). This doesn't actually cause failures, because we mask with OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke