Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] ARM aapcs calling convention for small vectors"
2012 Jul 05
0
[LLVMdev] RE : Vector argument passing abi for ARM ?
Hi Sebastien,
> I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG.
yes it is a bug.
> I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer.
I didn't read Nadav's reply as saying there was no bug, in fact he explicitly
said in his email
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
Hi Duncan,
I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG.
I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer.
Here is a small example to reproduce the problem I'm experiencing:
; ModuleID = 'bugparam.ll'
target datalayout =
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote:
> Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16,
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hi Pete,
We ran into the same issue with generating vector loads/stores for vectors
with less than word alignment. It seems we took a similar approach to
solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
As you and Jim mentioned, it looks like the vld1/vst1 instructions should
support element aligned access for any armv7 implementation (I'm looking at
Table A3-1
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the vldr
you reference here.
It looks like a recent change (r161962?) did enable vld1 generation for
this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the preferable way to
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Friday, September 07, 2012 10:57 AM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
>
> On Sep 6, 2012, at 4:40 PM, David Peixotto
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
-----Original Message-----
From: Bob Wilson [mailto:bob.wilson at apple.com]
Sent: Thursday, September 06, 2012 3:39 PM
To: David Peixotto
Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
2013 May 21
0
[PATCH] 02-
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
--
Aur?lien Zanelli
Parrot SA
174, quai de Jemmapes
75010 Paris
France
-------------- next part --------------
diff --git
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> Hi Pete,
>
> We ran into the same issue with generating vector loads/stores for vectors
> with less than word alignment. It seems we took a similar approach to
> solving the problem by modifying the logic in allowsUnalignedMemoryAccesses.
>
> As you and Jim mentioned, it looks like the
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Thursday, September 06, 2012 3:39 PM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
Please ignore my previous mail and patch, there is a new version :).
Patch changes are:
- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It
increase performance when using optimized macros (ex: ARMv5E). A
possible side effect of loop unroll is that i don't check for odd length
here.
- Add NEON version of FIR filter and autocorr
- Add a section in autoconf in order to
2012 Jul 05
0
[LLVMdev] Vector argument passing abi for ARM ?
Hi Sebastien,
> Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
the code generators are supposed to produce working code no matter what the
parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8>
is passed just means that the code generators can pass it using whatever
technique it feels like (since it
2012 Jul 05
3
[LLVMdev] Vector argument passing abi for ARM ?
Hi Rotem,
Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
Best Regards
Seb
> -----Original Message-----
> From: Rotem, Nadav [mailto:nadav.rotem at intel.com]
> Sent: Thursday, July 05, 2012 11:21 AM
> To: Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
> Subject: RE: Vector argument passing abi for ARM ?
>
> The
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2011 Sep 01
0
[PATCH 3/5] resample: Add NEON optimized inner_product_single for fixed point
From: Jyri Sarha <jsarha at ti.com>
Semantics of inner_product_single have also been changed to contain
the final right shift and saturation so it can also be implemented in
the optimal way for the used platform. This change affects fixed point
calculations only.
I also added a new fixed point macro SATURATE32PSHR(x, shift, a). It
does pretty much the same thing as SATURATE32(PSHR32(x,
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hello Renato,
You're pointing me at ARM intrinsics related to loads, problem that I've reported in original e-mail, is not support for vector loads, but support for 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is legal to write in LLVM:
; ModuleID = 'vadd.ll'
target datalayout =
2012 Sep 21
0
[LLVMdev] ARM aapcs calling convention for small vectors
> I was wondering if ARM aapcs calling convention defines how to pass small vectors as parameter to a routine.
> By small vectors, I mean with size less than a 32-bit integer.
The AAPCS is silent on the matter. Vectors with size <= 32 aren't
recognised by the PCS so a language wanting to support them with
well-defined semantics (including LLVM) is expected to choose and
specify which
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
Hi all,
I’m not sure if this is the right list, so apologies if not.
Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads
2012 Sep 21
5
[LLVMdev] Question about LLVM NEON intrinsics
Hi all,
I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units.
Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:
; ModuleID = 'vmax.ll'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple =