thr3ads.net - similar to: "[PATCH] 02-"

Displaying 20 results from an estimated 300 matches similar to: "[PATCH] 02-"

[PATCH] 02-Add CELT filter optimizations

2013 May 21

[PATCH] 02-Add CELT filter optimizations

Please ignore my previous mail and patch, there is a new version :). Patch changes are: - Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr - Add a section in autoconf in order to

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

2011 Sep 01

[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point

From: Jyri Sarha <jsarha at ti.com> Semantics of inner_product_single have also been changed to contain the final right shift and saturation so it can also be implemented in the optimal way for the used platform. This change affects fixed point calculations only. I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It does pretty much the same thing as SATURATE32(PSHR32(x,

regarding ARM NEON CELT filter optimizations

2013 May 21

regarding ARM NEON CELT filter optimizations

Hello Aurelien, + "vdup.s16 d8, %1;\n" //Duplicate num in d8 lane + "vdup.s16 q5, %4;\n" //Duplicate mem in q5 lane + + /* We try to process 16 samples at a time */ + "movs %5, %3, lsr #4;\n" + "beq .celt_fir1_process16_done_%=;\n" + + ".celt_fir1_process16_%=:\n" + /* Load 16 x values in q0, q1 lanes */ +

[PATCH 0/5] ARM NEON optimization for samplerate converter

2011 Sep 01

[PATCH 0/5] ARM NEON optimization for samplerate converter

From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hi Pete, We ran into the same issue with generating vector loads/stores for vectors with less than word alignment. It seems we took a similar approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. As you and Jim mentioned, it looks like the vld1/vst1 instructions should support element aligned access for any armv7 implementation (I'm looking at Table A3-1

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the preferable way to

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took

[LLVMdev] [global-isel] Type-independence of load/store

2013 Aug 09

[LLVMdev] [global-isel] Type-independence of load/store

Hi Jakob, Sounds like a really exciting topic; I'd love to be involved in implementation. I've not really had time to think about the implications of the larger picture, but one detail did strike me on the first read-through: > On the other hand, when types are not used to select register banks, it > becomes really difficult to explain the difference between load i32 and load >

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

> -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Friday, September 07, 2012 10:57 AM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > > On Sep 6, 2012, at 4:40 PM, David Peixotto

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

-----Original Message----- From: Bob Wilson [mailto:bob.wilson at apple.com] Sent: Thursday, September 06, 2012 3:39 PM To: David Peixotto Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote: > Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16,

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > Hi Pete, > > We ran into the same issue with generating vector loads/stores for vectors > with less than word alignment. It seems we took a similar approach to > solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Thursday, September 06, 2012 3:39 PM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector

[LLVMdev] [global-isel] Type-independence of load/store

2013 Aug 09

[LLVMdev] [global-isel] Type-independence of load/store

On Aug 9, 2013, at 5:12 AM, Tim Northover <t.p.northover at gmail.com> wrote: > Sounds like a really exciting topic; I'd love to be involved in > implementation. We need all the volunteers we can get. ;) >> On the other hand, when types are not used to select register banks, it >> becomes really difficult to explain the difference between load i32 and load >>

[LLVMdev] MC Hammer Test results

2012 May 24

[LLVMdev] MC Hammer Test results

Hello everyone At EuroLLVM I presented some testing work we have been doing on improving correctness of the MC Layer for ARM. There seemed to be interest from the community in seeing the results of this test suite. Background ----------- We are using a test suite, called MC Hammer, that compares MC with an ARM in-house implementation of the same functionality. The test space for this suite is

[LLVMdev] ARM aapcs calling convention for small vectors

2012 Sep 21

[LLVMdev] ARM aapcs calling convention for small vectors

Hi all, I was wondering if ARM aapcs calling convention defines how to pass small vectors as parameter to a routine. By small vectors, I mean with size less than a 32-bit integer. For instance if we consider following code: ; ModuleID = 'smallvect.ll' define arm_aapcscc void @foo(<2 x i8>* %p) { L.entry: %0 = load <2 x i8>* %p call arm_aapcscc void @bar(<2 x

[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.

2015 Aug 05

[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.

--- celt/arm/arm_celt_map.c | 24 +++++++++++- celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++---------------- 2 files changed, 88 insertions(+), 33 deletions(-) diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c index 0c9acff..cc6b706 100644 --- a/celt/arm/arm_celt_map.c +++ b/celt/arm/arm_celt_map.c @@ -94,9 +94,14 @@ void (*const

ASM runtime detection and optimizations

2013 May 23

ASM runtime detection and optimizations

I wrote a proof of concept regarding the cpu capabilities runtime detection and choice of optimized function. I follow design which had been discussed on IRC. Also, i notice a little drawback: we must propagate the arch index through functions which don't have codec state as argument. However, if it's look good, i will continue to implement it. Best regards, -- Aur?lien Zanelli

AVX Optimizations

2015 Nov 05

AVX Optimizations

Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with the AVX code and tests for pitch code. Actually, I lied. Because you update opus_select_arch(), you can now return a value for arch (4) that is larger than the maximum we currently support (3). This doesn't actually cause failures, because we mask with OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke

similar to: [PATCH] 02-