similar to: [LLVMdev] Unaligned vector memory access for ARM/NEON.

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Unaligned vector memory access for ARM/NEON."

2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
VLD1 expects a 64-bit aligned address unless the target explicitly days that unaligned loads are OK. For your situation, either the subtarget should set AllowsUnalignedMem to true (if that's accurate), or the load address should be made 64-bit aligned. -Jim On Sep 5, 2012, at 2:42 PM, Peter Couperus <peter.couperus at st.com> wrote: > Hello all, > > I am a first time writer
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the preferable way to
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > Hi Pete, > > We ran into the same issue with generating vector loads/stores for vectors > with less than word alignment. It seems we took a similar approach to > solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. > > As you and Jim mentioned, it looks like the
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hi Pete, We ran into the same issue with generating vector loads/stores for vectors with less than word alignment. It seems we took a similar approach to solving the problem by modifying the logic in allowsUnalignedMemoryAccesses. As you and Jim mentioned, it looks like the vld1/vst1 instructions should support element aligned access for any armv7 implementation (I'm looking at Table A3-1
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
-----Original Message----- From: Bob Wilson [mailto:bob.wilson at apple.com] Sent: Thursday, September 06, 2012 3:39 PM To: David Peixotto Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote: > -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Thursday, September 06, 2012 3:39 PM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
> -----Original Message----- > From: Bob Wilson [mailto:bob.wilson at apple.com] > Sent: Friday, September 07, 2012 10:57 AM > To: David Peixotto > Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > > On Sep 6, 2012, at 4:40 PM, David Peixotto
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote: > Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16,
2012 Sep 10
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On Sep 7, 2012, at 4:46 PM, David Peixotto <dpeixott at codeaurora.org> wrote: >> Note that this won't work on big-endian targets where changing the >> vld1/vst1 element size actually changes the behavior of the instructions. > > Ok, so would it be best to put an explicit test for endianess in the code? > The td files already contain restrictions some for endianess
2012 Sep 10
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
On 10 September 2012 06:44, Bob Wilson <bob.wilson at apple.com> wrote: > I don't know if anyone actually uses arm processors in big-endian mode, but it shouldn't be too hard to conditionalize it. If it does turn out to be difficult for some reason, we should at least have comments to indicate where the endian assumptions are being made. AFAICR, people used to use big-endian on
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com> I optimized Speex resampler for NEON capable ARM CPUs. The first patch should speed up resampling on any platform that can spare the increased memory usage. It would be nice to have these merged to the master branch. Please let me know if there is anything I can do to help the the merge. The patches have been rebased on top of master branch in
2012 Oct 11
2
[LLVMdev] vselect on ARM/NEON
Hello, We've run into a couple of cases where we'd like to use select on vector types, but vselect handling is absent from the ARM backend. Would there be any potential harm by marking VSELECT as Expand on ARM targets with NEON? Adding this seems to fix the following PR's: http://llvm.org/bugs/show_bug.cgi?id=13831 http://llvm.org/bugs/show_bug.cgi?id=13961 Thanks! Pete
2015 Jan 05
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote: >>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing
2012 Oct 11
0
[LLVMdev] vselect on ARM/NEON
Seems reasonable to me. Plain 'SELECT' is already marked expand for vector types. I bet that just didn't get updates when VSELECT was introduced. -Jim On Oct 11, 2012, at 10:25 AM, Peter Couperus <peter.couperus at st.com> wrote: > Hello, > > We've run into a couple of cases where we'd like to use select on vector types, but vselect handling is absent from
2012 Oct 11
1
[LLVMdev] vselect on ARM/NEON
If you mark VSELECT as 'expand' then it will be expanded to a sequence of AND/OR/XOR, which is pretty efficient (found in LegalizeVectorOps.cpp ExpandVSELECT). On Oct 11, 2012, at 11:05 AM, Jim Grosbach <grosbach at apple.com> wrote: > Seems reasonable to me. Plain 'SELECT' is already marked expand for vector types. I bet that just didn't get updates when VSELECT
2015 Jan 05
4
[LLVMdev] NEON intrinsics preventing redundant load optimization?
Hi all, Sorry for arriving late to the party. First, some context: vld1 is not the same as a pointer dereference. The alignment requirements are different (which I saw you hacked around in your testcase using attribute((aligned(4))) ), and in big endian environments they do totally different things (VLD1 does element-wise byteswapping and pointer dereferences byteswaps the entire 128-bit
2014 Dec 08
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 8 Dec 2014, at 00:13, Renato Golin <renato.golin at linaro.org> wrote: > On 7 December 2014 at 19:15, Simon Taylor <simontaylor1 at ntlworld.com> wrote: >> Is there something about the use of intrinsics that prevents the compiler optimizing out the redundant store on the stack? Is there any hope for this improving in the future, or anything I can do now to improve the
2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On 21 September 2012 09:28, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>, <16 x float>) nounwind readnone > > llc fails with following message: > > SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0] > > LLVM ERROR: Do not