Displaying 20 results from an estimated 700 matches similar to: "[PATCH] 02-Add CELT filter optimizations"
2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
--
Aur?lien Zanelli
Parrot SA
174, quai de Jemmapes
75010 Paris
France
-------------- next part --------------
diff --git
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
From: Jyri Sarha <jsarha at ti.com>
Semantics of inner_product_single have also been changed to contain
the final right shift and saturation so it can also be implemented in
the optimal way for the used platform. This change affects fixed point
calculations only.
I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It
does pretty much the same thing as SATURATE32(PSHR32(x,
2013 May 21
0
regarding ARM NEON CELT filter optimizations
Hello Aurelien,
+ "vdup.s16 d8, %1;\n" //Duplicate num in d8 lane
+ "vdup.s16 q5, %4;\n" //Duplicate mem in q5 lane
+
+ /* We try to process 16 samples at a time */
+ "movs %5, %3, lsr #4;\n"
+ "beq .celt_fir1_process16_done_%=;\n"
+
+ ".celt_fir1_process16_%=:\n"
+ /* Load 16 x values in q0, q1 lanes */
+
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote:
> Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16,
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hi Pete,
We ran into the same issue with generating vector loads/stores for vectors
with less than word alignment. It seems we took a similar approach to
solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
As you and Jim mentioned, it looks like the vld1/vst1 instructions should
support element aligned access for any armv7 implementation (I'm looking at
Table A3-1
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the vldr
you reference here.
It looks like a recent change (r161962?) did enable vld1 generation for
this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Friday, September 07, 2012 10:57 AM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
>
> On Sep 6, 2012, at 4:40 PM, David Peixotto
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
-----Original Message-----
From: Bob Wilson [mailto:bob.wilson at apple.com]
Sent: Thursday, September 06, 2012 3:39 PM
To: David Peixotto
Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Thursday, September 06, 2012 3:39 PM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> Hi Pete,
>
> We ran into the same issue with generating vector loads/stores for vectors
> with less than word alignment. It seems we took a similar approach to
> solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
>
> As you and Jim mentioned, it looks like the
2020 Mar 31
2
[ARM] Register pressure with -mthumb forces register reload before each call
Hi,
Compiling attached test-case, which is reduced version of of
uECC_shared_secret from tinycrypt library [1], with
--target=arm-linux-gnueabi -march=armv6-m -Oz -S
results in reloading of register holding function's address before
every call to blx:
ldr r3, .LCPI0_0
blx r3
mov r0, r6
mov r1, r5
mov r2, r4
ldr r3,
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2020 Apr 15
4
[ARM] Register pressure with -mthumb forces register reload before each call
Hi,
I have attached WIP patch for adding foldMemoryOperand to Thumb1InstrInfo.
For the following case:
void f(int x, int y, int z)
{
void bar(int, int, int);
bar(x, y, z);
bar(x, z, y);
bar(y, x, z);
bar(y, y, x);
}
it calls foldMemoryOperand twice, and thus converts two calls from blx to bl.
callMI->dump() shows the function name "bar" correctly, however in
generated
2012 Sep 21
2
[LLVMdev] ARM aapcs calling convention for small vectors
Hi all,
I was wondering if ARM aapcs calling convention defines how to pass small vectors as parameter to a routine.
By small vectors, I mean with size less than a 32-bit integer. For instance if we consider following code:
; ModuleID = 'smallvect.ll'
define arm_aapcscc void @foo(<2 x i8>* %p) {
L.entry:
%0 = load <2 x i8>* %p
call arm_aapcscc void @bar(<2 x
2012 Jun 24
0
[LLVMdev] Request for merge: GHC/ARM calling convention.
Hi Karel,
I understand this patch has already been merged (to 3.0), so don't
take my question as stopping the merge to head, I'm just making sure I
got it right... The rest looks correct.
+ CCIfType<[v2f64], CCAssignToReg<[Q4, Q5]>>,
+ CCIfType<[f64], CCAssignToReg<[D8, D9, D10, D11]>>,
+ CCIfType<[f32], CCAssignToReg<[S16, S17, S18, S19, S20, S21, S22,
2015 Aug 05
0
[PATCH 2/8] Reorganize pitch_arm.h, so RTCD works for intrinsics functions as well.
---
celt/arm/arm_celt_map.c | 24 +++++++++++-
celt/arm/pitch_arm.h | 97 +++++++++++++++++++++++++++++++++----------------
2 files changed, 88 insertions(+), 33 deletions(-)
diff --git a/celt/arm/arm_celt_map.c b/celt/arm/arm_celt_map.c
index 0c9acff..cc6b706 100644
--- a/celt/arm/arm_celt_map.c
+++ b/celt/arm/arm_celt_map.c
@@ -94,9 +94,14 @@ void (*const
2013 May 23
2
ASM runtime detection and optimizations
I wrote a proof of concept regarding the cpu capabilities runtime
detection and choice of optimized function. I follow design which had
been discussed on IRC.
Also, i notice a little drawback: we must propagate the arch index
through functions which don't have codec state as argument.
However, if it's look good, i will continue to implement it.
Best regards,
--
Aur?lien Zanelli
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with the AVX code and tests for pitch code.
Actually, I lied. Because you update opus_select_arch(), you can now
return a value for arch (4) that is larger than the maximum we currently
support (3). This doesn't actually cause failures, because we mask with
OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke