Displaying 20 results from an estimated 300 matches similar to: "[PATCH] 02-"
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
Please ignore my previous mail and patch, there is a new version :).
Patch changes are:
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
- Add a section in autoconf in order to
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
From: Jyri Sarha <jsarha at ti.com>
Semantics of inner_product_single have also been changed to contain
the final right shift and saturation so it can also be implemented in
the optimal way for the used platform. This change affects fixed point
calculations only.
I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It
does pretty much the same thing as SATURATE32(PSHR32(x,
2013 May 21
0
regarding ARM NEON CELT filter optimizations
Hello Aurelien,
+ "vdup.s16 d8, %1;\n" //Duplicate num in d8 lane
+ "vdup.s16 q5, %4;\n" //Duplicate mem in q5 lane
+
+ /* We try to process 16 samples at a time */
+ "movs %5, %3, lsr #4;\n"
+ "beq .celt_fir1_process16_done_%=;\n"
+
+ ".celt_fir1_process16_%=:\n"
+ /* Load 16 x values in q0, q1 lanes */
+
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hi Pete,
We ran into the same issue with generating vector loads/stores for vectors
with less than word alignment. It seems we took a similar approach to
solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
As you and Jim mentioned, it looks like the vld1/vst1 instructions should
support element aligned access for any armv7 implementation (I'm looking at
Table A3-1
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the vldr
you reference here.
It looks like a recent change (r161962?) did enable vld1 generation for
this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2013 Aug 09
3
[LLVMdev] [global-isel] Type-independence of load/store
Hi Jakob,
Sounds like a really exciting topic; I'd love to be involved in
implementation. I've not really had time to think about the
implications of the larger picture, but one detail did strike me on
the first read-through:
> On the other hand, when types are not used to select register banks, it
> becomes really difficult to explain the difference between load i32 and load
>
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Friday, September 07, 2012 10:57 AM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
>
> On Sep 6, 2012, at 4:40 PM, David Peixotto
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
-----Original Message-----
From: Bob Wilson [mailto:bob.wilson at apple.com]
Sent: Thursday, September 06, 2012 3:39 PM
To: David Peixotto
Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote:
> Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16,
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> Hi Pete,
>
> We ran into the same issue with generating vector loads/stores for vectors
> with less than word alignment. It seems we took a similar approach to
> solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
>
> As you and Jim mentioned, it looks like the
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Thursday, September 06, 2012 3:39 PM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector
2013 Aug 09
0
[LLVMdev] [global-isel] Type-independence of load/store
On Aug 9, 2013, at 5:12 AM, Tim Northover <t.p.northover at gmail.com> wrote:
> Sounds like a really exciting topic; I'd love to be involved in
> implementation.
We need all the volunteers we can get. ;)
>> On the other hand, when types are not used to select register banks, it
>> becomes really difficult to explain the difference between load i32 and load
>>
2012 May 24
0
[LLVMdev] MC Hammer Test results
Hello everyone
At EuroLLVM I presented some testing work we have been doing on improving
correctness of the MC Layer for ARM. There seemed to be interest from the
community in seeing the results of this test suite.
Background
-----------
We are using a test suite, called MC Hammer, that compares MC with an ARM
in-house implementation of the same functionality. The test space for this suite
is
2012 Sep 21
2
[LLVMdev] ARM aapcs calling convention for small vectors
Hi all,
I was wondering if ARM aapcs calling convention defines how to pass small vectors as parameter to a routine.
By small vectors, I mean with size less than a 32-bit integer. For instance if we consider following code:
; ModuleID = 'smallvect.ll'
define arm_aapcscc void @foo(<2 x i8>* %p) {
L.entry:
%0 = load <2 x i8>* %p
call arm_aapcscc void @bar(<2 x
2015 Aug 05
0
[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.
---
celt/arm/arm_celt_map.c | 24 +++++++++++-
celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++----------------
2 files changed, 88 insertions(+), 33 deletions(-)
diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c
index 0c9acff..cc6b706 100644
--- a/celt/arm/arm_celt_map.c
+++ b/celt/arm/arm_celt_map.c
@@ -94,9 +94,14 @@ void (*const
2013 May 23
2
ASM runtime detection and optimizations
I wrote a proof of concept regarding the cpu capabilities runtime
detection and choice of optimized function. I follow design which had
been discussed on IRC.
Also, i notice a little drawback: we must propagate the arch index
through functions which don't have codec state as argument.
However, if it's look good, i will continue to implement it.
Best regards,
--
Aur?lien Zanelli
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with the AVX code and tests for pitch code.
Actually, I lied. Because you update opus_select_arch(), you can now
return a value for arch (4) that is larger than the maximum we currently
support (3). This doesn't actually cause failures, because we mask with
OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke