Displaying 20 results from an estimated 33 matches for "vldr".
Did you mean:
ldr
2013 Oct 21
1
[LLVMdev] MI scheduler produce badly code with inline function
Hi Andy, I'm working on defining new machine model for my target,
But I don't understand how to define the in-order machine (reservation
tables) in new model.
For example, if target has IF ID EX WB stages
should I do:
let BufferSize=0 in {
def IF: ProcResource<1>; def ID: ProcResource<1>;
def EX: ProcResource<1>; def WB: ProcResource<1>;
}
def :
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly f...
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia r0, {d16, d17}
bx lr
Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI...
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the vldr
you reference here.
It looks like a recent change (r161962?) did enable vld1 generation for
this case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is t...
2013 Jun 19
3
[LLVMdev] Vector type LOAD/STORE with post-increment.
...rement for an
out of tree backend.
I see that that ARM NEON support such load/store so I am using ARM NEON as
an example of what to do.
The problem is I can't get any C or C++ code example to actually generate
vector load/store with post increment.
I am talking about something like this:
vldr d16, [sp, #8]
Does anybody know any C/C++ code example that will generate such code
(especially loop)? Is this supported by the auto-vectorizer?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/2013...
2013 Oct 15
0
[LLVMdev] MI scheduler produce badly code with inline function
On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote:
> Hi all,
> I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
>
> The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.
A bug for this is welcome. Pretty soon, I’ll
2013 Jun 19
0
[LLVMdev] Vector type LOAD/STORE with post-increment.
On 19 June 2013 11:32, Francois Pichet <pichet2000 at gmail.com> wrote:
> I am talking about something like this:
> vldr d16, [sp, #8]
>
Hi Francois,
This is just using the offset, not updating the register (see ARM ARM
A8.5). Post-increment only has meaning if you write-back the new value to
the register like:
vldr d16, [sp], #8
Did you mean write-back? or just offset?
Does anybody know any C/C++ code...
2013 Oct 14
2
[LLVMdev] MI scheduler produce badly code with inline function
Hi all,
I meet this problem when compiling the TREAM benchmark (
http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
The small function will be scheduled as good code, but if opt inline this
function, the inline part will be scheduled as bad code.
so I rewrite a simple code as attached link (foo.c), and compiled with two
different methods:
*method A:*
*$clang -O3 foo.c -static -S
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...On
Behalf Of Peter Couperus
Sent: Thursday, September 06, 2012 8:14 AM
To: Jim Grosbach
Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello,
Thanks again. We did try overestimating the alignment, and saw the vldr you
reference here.
It looks like a recent change (r161962?) did enable vld1 generation for this
case (great!) on darwin, but not linux.
I'm not sure if the effect of lowering load <4 x i16>* align 2 to
vld1.16 this was intentional in this change or not.
If so, my question is what is the...
2011 May 26
2
[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team
Hi all,
LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community.
If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com
Thanks,
Evan
Job description
The Apple compiler team is seeking an engineer who is strongly
2011 May 27
1
[LLVMdev] Question about ARM/vfp/NEON code generation
...sub sp, sp, #8
str lr, [sp, #4]
str r7, [sp]
mov r7, sp
sub sp, sp, #36
str r0, [r7, #-4]
vmov s0, r0
str r1, [r7, #-8]
vmov s1, r1
str r2, [r7, #-12]
vmov s2, r2
vldr.32 s3, [r7, #-4]
vldr.32 s4, [r7, #-8]
vmul.f32 s3, s3, s4
vstr.32 s3, [r7, #-16]
vldr.32 s4, [r7, #-12]
vcmpe.f32 s3, s4
vmrs apsr_nzcv, fpscr
vstr.32 s0, [sp, #16]
vstr.32 s2, [sp, #12]
vstr.32 s1, [sp, #8]...
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...Thursday, September 06, 2012 8:14 AM
> To: Jim Grosbach
> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
> Hello,
>
> Thanks again. We did try overestimating the alignment, and saw the vldr you
> reference here.
> It looks like a recent change (r161962?) did enable vld1 generation for this
> case (great!) on darwin, but not linux.
> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
> vld1.16 this was intentional in this change or not.
> If s...
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...ay, September 06, 2012 8:14 AM
> To: Jim Grosbach
> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
> Hello,
>
> Thanks again. We did try overestimating the alignment, and saw the
> vldr you reference here.
> It looks like a recent change (r161962?) did enable vld1 generation
> for this case (great!) on darwin, but not linux.
> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
> vld1.16 this was intentional in this change or not.
> If so, m...
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...gt;> To: Jim Grosbach
>> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
>> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>>
>> Hello,
>>
>> Thanks again. We did try overestimating the alignment, and saw the
>> vldr you reference here.
>> It looks like a recent change (r161962?) did enable vld1 generation
>> for this case (great!) on darwin, but not linux.
>> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
>> vld1.16 this was intentional in this change or no...
2013 Oct 16
3
[LLVMdev] MI scheduler produce badly code with inline function
...-misched -mllvm -scheditins=false
per-operand cost model :
Scale:
push {lr}
movw r12, :lower16:c
movw lr, :lower16:b
movw r3, #9216
movt r12, :upper16:c
mov r1, #0
vmov.f64 d16, #3.000000e+00
movt lr, :upper16:b
movt r3, #244
.LBB0_1:
add r0, r12, r1
add r2, lr, r1
*vldr d17, [r0]*
add r1, r1, #32
vmul.f64 d17, d17, d16
cmp r1, r3
vstr d17, [r2]
* vldr d17, [r0, #8]*
vmul.f64 d17, d17, d16
* * vstr d17, [r2, #8]
* vldr d17, [r0, #16]*
vmul.f64 d17, d17, d16
vstr d17, [r2, #16]
* vldr d17, [r0, #24]*
vmul.f64 d17, d17, d16
vstr d17,...
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...t;> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
> >> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
> >>
> >> Hello,
> >>
> >> Thanks again. We did try overestimating the alignment, and saw the
> >> vldr you reference here.
> >> It looks like a recent change (r161962?) did enable vld1 generation
> >> for this case (great!) on darwin, but not linux.
> >> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
> >> vld1.16 this was intentional i...
2019 Mar 28
3
Why does LLVM keep some loads in the loops even after applying the O3 optimization?
...at the assembly code of a loop body which is created by
applying O3 optimization. Here it is:
.LBB4_19: @ %for.body.91
@ =>This Inner Loop Header: Depth=1
ldr r0, [r5]
mov r1, r8
add r0, r0, r7
vldr s0, [r0]
mov r0, r6
vcvt.f64.f32 d0, s0
vmov r2, r3, d0
bl fprintf
cmp r0, #0
blt .LBB4_25
@ BB#20: @ %for.cond.89
@ in Loop: Header=BB4_19 Depth=1
ldr r0, .LCPI4_2...
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...1_lane_f32(&sumi, tv.val[0], 0);
Accessing tv.val[0] and tv.val[1] directly seems to send these values
through the stack, e.g.,
f4: f3ba7085 vtrn.32 d7, d5
f8: ed0b7b0f vstr d7, [fp, #-60]
fc: ed0b5b0d vstr d5, [fp, #-52]
...
114: ed1b6b09 vldr d6, [fp, #-36]
118: ed1b7b0b vldr d7, [fp, #-44]
11c: f2077d06 vadd.f32 d7, d7, d6
120: f483780f vst1.32 {d7[0]}, [r3]
Can't you just use
float32x2_t tv;
tv = vadd_f32(vget_low_f32(SUMM), vget_high_f32(SUMM));
tv = vpadd_f32(tv, tv);
(you...
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
...-mcpu=cortex-a9 -mattr=+neon,+neonfp -relocation-model=pic -o bugparam.s
with LLVM 3.0, it works, with LLVM 3.1 generated code contains a misaligned load:
bar: @ @bar
@ BB#0: @ %L.entry
push {r11, lr}
add r0, r1, #2
vldr s0, [r1]
vldr s2, [r0] # <= here load is misaligned
vmovl.u8 q8, d0
vmovl.u8 q9, d1
vmovl.u16 q8, d16
vmovl.u16 q9, d18
vmov r0, r1, d16
vmov r2, r3, d18
bl zzz(PLT)
pop {r11, pc}
with LLVM trunk, ass...