search for: vldr

Displaying 20 results from an estimated 33 matches for "vldr".

Did you mean: ldr
2013 Oct 21
1
[LLVMdev] MI scheduler produce badly code with inline function
Hi Andy, I'm working on defining new machine model for my target, But I don't understand how to define the in-order machine (reservation tables) in new model. For example, if target has IF ID EX WB stages should I do: let BufferSize=0 in { def IF: ProcResource<1>; def ID: ProcResource<1>; def EX: ProcResource<1>; def WB: ProcResource<1>; } def :
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly f...
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia r0, {d16, d17} bx lr Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI...
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is t...
2013 Jun 19
3
[LLVMdev] Vector type LOAD/STORE with post-increment.
...rement for an out of tree backend. I see that that ARM NEON support such load/store so I am using ARM NEON as an example of what to do. The problem is I can't get any C or C++ code example to actually generate vector load/store with post increment. I am talking about something like this: vldr d16, [sp, #8] Does anybody know any C/C++ code example that will generate such code (especially loop)? Is this supported by the auto-vectorizer? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/2013...
2013 Oct 15
0
[LLVMdev] MI scheduler produce badly code with inline function
On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll
2013 Jun 19
0
[LLVMdev] Vector type LOAD/STORE with post-increment.
On 19 June 2013 11:32, Francois Pichet <pichet2000 at gmail.com> wrote: > I am talking about something like this: > vldr d16, [sp, #8] > Hi Francois, This is just using the offset, not updating the register (see ARM ARM A8.5). Post-increment only has meaning if you write-back the new value to the register like: vldr d16, [sp], #8 Did you mean write-back? or just offset? Does anybody know any C/C++ code...
2013 Oct 14
2
[LLVMdev] MI scheduler produce badly code with inline function
Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...On Behalf Of Peter Couperus Sent: Thursday, September 06, 2012 8:14 AM To: Jim Grosbach Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the...
2011 May 26
2
[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team
Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly
2011 May 27
1
[LLVMdev] Question about ARM/vfp/NEON code generation
...sub sp, sp, #8 str lr, [sp, #4] str r7, [sp] mov r7, sp sub sp, sp, #36 str r0, [r7, #-4] vmov s0, r0 str r1, [r7, #-8] vmov s1, r1 str r2, [r7, #-12] vmov s2, r2 vldr.32 s3, [r7, #-4] vldr.32 s4, [r7, #-8] vmul.f32 s3, s3, s4 vstr.32 s3, [r7, #-16] vldr.32 s4, [r7, #-12] vcmpe.f32 s3, s4 vmrs apsr_nzcv, fpscr vstr.32 s0, [sp, #16] vstr.32 s2, [sp, #12] vstr.32 s1, [sp, #8]...
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...Thursday, September 06, 2012 8:14 AM > To: Jim Grosbach > Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > Hello, > > Thanks again. We did try overestimating the alignment, and saw the vldr you > reference here. > It looks like a recent change (r161962?) did enable vld1 generation for this > case (great!) on darwin, but not linux. > I'm not sure if the effect of lowering load <4 x i16>* align 2 to > vld1.16 this was intentional in this change or not. > If s...
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...ay, September 06, 2012 8:14 AM > To: Jim Grosbach > Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > Hello, > > Thanks again. We did try overestimating the alignment, and saw the > vldr you reference here. > It looks like a recent change (r161962?) did enable vld1 generation > for this case (great!) on darwin, but not linux. > I'm not sure if the effect of lowering load <4 x i16>* align 2 to > vld1.16 this was intentional in this change or not. > If so, m...
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...gt;> To: Jim Grosbach >> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) >> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. >> >> Hello, >> >> Thanks again. We did try overestimating the alignment, and saw the >> vldr you reference here. >> It looks like a recent change (r161962?) did enable vld1 generation >> for this case (great!) on darwin, but not linux. >> I'm not sure if the effect of lowering load <4 x i16>* align 2 to >> vld1.16 this was intentional in this change or no...
2013 Oct 16
3
[LLVMdev] MI scheduler produce badly code with inline function
...-misched -mllvm -scheditins=false per-operand cost model : Scale: push {lr} movw r12, :lower16:c movw lr, :lower16:b movw r3, #9216 movt r12, :upper16:c mov r1, #0 vmov.f64 d16, #3.000000e+00 movt lr, :upper16:b movt r3, #244 .LBB0_1: add r0, r12, r1 add r2, lr, r1 *vldr d17, [r0]* add r1, r1, #32 vmul.f64 d17, d17, d16 cmp r1, r3 vstr d17, [r2] * vldr d17, [r0, #8]* vmul.f64 d17, d17, d16 * * vstr d17, [r2, #8] * vldr d17, [r0, #16]* vmul.f64 d17, d17, d16 vstr d17, [r2, #16] * vldr d17, [r0, #24]* vmul.f64 d17, d17, d16 vstr d17,...
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...t;> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > >> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > >> > >> Hello, > >> > >> Thanks again. We did try overestimating the alignment, and saw the > >> vldr you reference here. > >> It looks like a recent change (r161962?) did enable vld1 generation > >> for this case (great!) on darwin, but not linux. > >> I'm not sure if the effect of lowering load <4 x i16>* align 2 to > >> vld1.16 this was intentional i...
2019 Mar 28
3
Why does LLVM keep some loads in the loops even after applying the O3 optimization?
...at the assembly code of a loop body which is created by applying O3 optimization. Here it is: .LBB4_19: @ %for.body.91 @ =>This Inner Loop Header: Depth=1 ldr r0, [r5] mov r1, r8 add r0, r0, r7 vldr s0, [r0] mov r0, r6 vcvt.f64.f32 d0, s0 vmov r2, r3, d0 bl fprintf cmp r0, #0 blt .LBB4_25 @ BB#20: @ %for.cond.89 @ in Loop: Header=BB4_19 Depth=1 ldr r0, .LCPI4_2...
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...1_lane_f32(&sumi, tv.val[0], 0); Accessing tv.val[0] and tv.val[1] directly seems to send these values through the stack, e.g., f4: f3ba7085 vtrn.32 d7, d5 f8: ed0b7b0f vstr d7, [fp, #-60] fc: ed0b5b0d vstr d5, [fp, #-52] ... 114: ed1b6b09 vldr d6, [fp, #-36] 118: ed1b7b0b vldr d7, [fp, #-44] 11c: f2077d06 vadd.f32 d7, d7, d6 120: f483780f vst1.32 {d7[0]}, [r3] Can't you just use float32x2_t tv; tv = vadd_f32(vget_low_f32(SUMM), vget_high_f32(SUMM)); tv = vpadd_f32(tv, tv); (you...
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
...-mcpu=cortex-a9 -mattr=+neon,+neonfp -relocation-model=pic -o bugparam.s with LLVM 3.0, it works, with LLVM 3.1 generated code contains a misaligned load: bar: @ @bar @ BB#0: @ %L.entry push {r11, lr} add r0, r1, #2 vldr s0, [r1] vldr s2, [r0] # <= here load is misaligned vmovl.u8 q8, d0 vmovl.u8 q9, d1 vmovl.u16 q8, d16 vmovl.u16 q9, d18 vmov r0, r1, d16 vmov r2, r3, d18 bl zzz(PLT) pop {r11, pc} with LLVM trunk, ass...