thr3ads.net - search: "vldr"

Displaying 20 results from an estimated 33 matches for "vldr".

Did you mean: ldr

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 21

[LLVMdev] MI scheduler produce badly code with inline function

Hi Andy, I'm working on defining new machine model for my target, But I don't understand how to define the in-order machine (reservation tables) in new model. For example, if target has IF ID EX WB stages should I do: let BufferSize=0 in { def IF: ProcResource<1>; def ID: ProcResource<1>; def EX: ProcResource<1>; def WB: ProcResource<1>; } def :

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly f...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia r0, {d16, d17} bx lr Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is t...

[LLVMdev] Vector type LOAD/STORE with post-increment.

2013 Jun 19

[LLVMdev] Vector type LOAD/STORE with post-increment.

...rement for an out of tree backend. I see that that ARM NEON support such load/store so I am using ARM NEON as an example of what to do. The problem is I can't get any C or C++ code example to actually generate vector load/store with post increment. I am talking about something like this: vldr d16, [sp, #8] Does anybody know any C/C++ code example that will generate such code (especially loop)? Is this supported by the auto-vectorizer? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/2013...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] Vector type LOAD/STORE with post-increment.

2013 Jun 19

[LLVMdev] Vector type LOAD/STORE with post-increment.

On 19 June 2013 11:32, Francois Pichet <pichet2000 at gmail.com> wrote: > I am talking about something like this: > vldr d16, [sp, #8] > Hi Francois, This is just using the offset, not updating the register (see ARM ARM A8.5). Post-increment only has meaning if you write-back the new value to the register like: vldr d16, [sp], #8 Did you mean write-back? or just offset? Does anybody know any C/C++ code...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 14

[LLVMdev] MI scheduler produce badly code with inline function

Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...On Behalf Of Peter Couperus Sent: Thursday, September 06, 2012 8:14 AM To: Jim Grosbach Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. Hello, Thanks again. We did try overestimating the alignment, and saw the vldr you reference here. It looks like a recent change (r161962?) did enable vld1 generation for this case (great!) on darwin, but not linux. I'm not sure if the effect of lowering load <4 x i16>* align 2 to vld1.16 this was intentional in this change or not. If so, my question is what is the...

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

2011 May 26

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

...sub sp, sp, #8 str lr, [sp, #4] str r7, [sp] mov r7, sp sub sp, sp, #36 str r0, [r7, #-4] vmov s0, r0 str r1, [r7, #-8] vmov s1, r1 str r2, [r7, #-12] vmov s2, r2 vldr.32 s3, [r7, #-4] vldr.32 s4, [r7, #-8] vmul.f32 s3, s3, s4 vstr.32 s3, [r7, #-16] vldr.32 s4, [r7, #-12] vcmpe.f32 s3, s4 vmrs apsr_nzcv, fpscr vstr.32 s0, [sp, #16] vstr.32 s2, [sp, #12] vstr.32 s1, [sp, #8]...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...Thursday, September 06, 2012 8:14 AM > To: Jim Grosbach > Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > Hello, > > Thanks again. We did try overestimating the alignment, and saw the vldr you > reference here. > It looks like a recent change (r161962?) did enable vld1 generation for this > case (great!) on darwin, but not linux. > I'm not sure if the effect of lowering load <4 x i16>* align 2 to > vld1.16 this was intentional in this change or not. > If s...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...ay, September 06, 2012 8:14 AM > To: Jim Grosbach > Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > > Hello, > > Thanks again. We did try overestimating the alignment, and saw the > vldr you reference here. > It looks like a recent change (r161962?) did enable vld1 generation > for this case (great!) on darwin, but not linux. > I'm not sure if the effect of lowering load <4 x i16>* align 2 to > vld1.16 this was intentional in this change or not. > If so, m...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...gt;> To: Jim Grosbach >> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) >> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. >> >> Hello, >> >> Thanks again. We did try overestimating the alignment, and saw the >> vldr you reference here. >> It looks like a recent change (r161962?) did enable vld1 generation >> for this case (great!) on darwin, but not linux. >> I'm not sure if the effect of lowering load <4 x i16>* align 2 to >> vld1.16 this was intentional in this change or no...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

[LLVMdev] MI scheduler produce badly code with inline function

...-misched -mllvm -scheditins=false per-operand cost model : Scale: push {lr} movw r12, :lower16:c movw lr, :lower16:b movw r3, #9216 movt r12, :upper16:c mov r1, #0 vmov.f64 d16, #3.000000e+00 movt lr, :upper16:b movt r3, #244 .LBB0_1: add r0, r12, r1 add r2, lr, r1 *vldr d17, [r0]* add r1, r1, #32 vmul.f64 d17, d17, d16 cmp r1, r3 vstr d17, [r2] * vldr d17, [r0, #8]* vmul.f64 d17, d17, d16 * * vstr d17, [r2, #8] * vldr d17, [r0, #16]* vmul.f64 d17, d17, d16 vstr d17, [r2, #16] * vldr d17, [r0, #24]* vmul.f64 d17, d17, d16 vstr d17,...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...t;> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu) > >> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON. > >> > >> Hello, > >> > >> Thanks again. We did try overestimating the alignment, and saw the > >> vldr you reference here. > >> It looks like a recent change (r161962?) did enable vld1 generation > >> for this case (great!) on darwin, but not linux. > >> I'm not sure if the effect of lowering load <4 x i16>* align 2 to > >> vld1.16 this was intentional i...

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

2019 Mar 28

Why does LLVM keep some loads in the loops even after applying the O3 optimization?

...at the assembly code of a loop body which is created by applying O3 optimization. Here it is: .LBB4_19: @ %for.body.91 @ =>This Inner Loop Header: Depth=1 ldr r0, [r5] mov r1, r8 add r0, r0, r7 vldr s0, [r0] mov r0, r6 vcvt.f64.f32 d0, s0 vmov r2, r3, d0 bl fprintf cmp r0, #0 blt .LBB4_25 @ BB#20: @ %for.cond.89 @ in Loop: Header=BB4_19 Depth=1 ldr r0, .LCPI4_2...

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 09

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

...1_lane_f32(&sumi, tv.val[0], 0); Accessing tv.val[0] and tv.val[1] directly seems to send these values through the stack, e.g., f4: f3ba7085 vtrn.32 d7, d5 f8: ed0b7b0f vstr d7, [fp, #-60] fc: ed0b5b0d vstr d5, [fp, #-52] ... 114: ed1b6b09 vldr d6, [fp, #-36] 118: ed1b7b0b vldr d7, [fp, #-44] 11c: f2077d06 vadd.f32 d7, d7, d6 120: f483780f vst1.32 {d7[0]}, [r3] Can't you just use float32x2_t tv; tv = vadd_f32(vget_low_f32(SUMM), vget_high_f32(SUMM)); tv = vpadd_f32(tv, tv); (you...

[LLVMdev] RE : Vector argument passing abi for ARM ?

2012 Jul 05

[LLVMdev] RE : Vector argument passing abi for ARM ?

...-mcpu=cortex-a9 -mattr=+neon,+neonfp -relocation-model=pic -o bugparam.s with LLVM 3.0, it works, with LLVM 3.1 generated code contains a misaligned load: bar: @ @bar @ BB#0: @ %L.entry push {r11, lr} add r0, r1, #2 vldr s0, [r1] vldr s2, [r0] # <= here load is misaligned vmovl.u8 q8, d0 vmovl.u8 q9, d1 vmovl.u16 q8, d16 vmovl.u16 q9, d18 vmov r0, r1, d16 vmov r2, r3, d18 bl zzz(PLT) pop {r11, pc} with LLVM trunk, ass...

search for: vldr