Displaying 13 results from an estimated 13 matches for "vstmia".
2012 Sep 06
1
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...;s LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 04...
2012 Sep 05
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...#39;s entirely possible that it's LLVM that's confused about the alignment requirements here. :)
I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
extend: @ @extend
@ BB#0:
vldr d16, [r0]
vmovl.s16 q8, d16
vstmia r1, {d16, d17}
vldr d16, [r0, #8]
add r0, r1, #16
vmovl.s16 q8, d16
vstmia r0, {d16, d17}
bx lr
Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C), you're correct about the...
2012 Sep 05
3
[LLVMdev] Unaligned vector memory access for ARM/NEON.
Hello Jim,
Thank you for the response. I may be confused about the alignment rules
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to
indicate vld1.16 operates on 16-bit aligned data, unless I am
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element"
aligned, where I took
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...9;s LLVM that's confused about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 040...
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...VM that's confused
> about the alignment requirements here. :)
>
> I think I see, in general, where. I twiddled the IR to give it higher
alignment (16 bytes) and get:
> extend: @ @extend
> @ BB#0:
> vldr d16, [r0]
> vmovl.s16 q8, d16
> vstmia r1, {d16, d17}
> vldr d16, [r0, #8]
> add r0, r1, #16
> vmovl.s16 q8, d16
> vstmia r0, {d16, d17}
> bx lr
>
> Note that we're using a plain vldr instruction here to load the d
register, not a vld1 instruction. Similarly for the stores. According to the
ARM ARM (DDI 040...
2012 Sep 06
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...the alignment requirements here. :)
>>
>> I think I see, in general, where. I twiddled the IR to give it higher
> alignment (16 bytes) and get:
>> extend: @ @extend
>> @ BB#0:
>> vldr d16, [r0]
>> vmovl.s16 q8, d16
>> vstmia r1, {d16, d17}
>> vldr d16, [r0, #8]
>> add r0, r1, #16
>> vmovl.s16 q8, d16
>> vstmia r0, {d16, d17}
>> bx lr
>>
>> Note that we're using a plain vldr instruction here to load the d
> register, not a vld1 instruction. Similarly for the stores....
2012 Sep 06
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...the alignment requirements here. :)
>>
>> I think I see, in general, where. I twiddled the IR to give it higher
> alignment (16 bytes) and get:
>> extend: @ @extend
>> @ BB#0:
>> vldr d16, [r0]
>> vmovl.s16 q8, d16
>> vstmia r1, {d16, d17}
>> vldr d16, [r0, #8]
>> add r0, r1, #16
>> vmovl.s16 q8, d16
>> vstmia r0, {d16, d17}
>> bx lr
>>
>> Note that we're using a plain vldr instruction here to load the d
> register, not a vld1 instruction. Similarly for the stores....
2012 Sep 07
2
[LLVMdev] Unaligned vector memory access for ARM/NEON.
.... :)
>>>
>>> I think I see, in general, where. I twiddled the IR to give it higher
>> alignment (16 bytes) and get:
>>> extend: @ @extend
>>> @ BB#0:
>>> vldr d16, [r0]
>>> vmovl.s16 q8, d16
>>> vstmia r1, {d16, d17}
>>> vldr d16, [r0, #8]
>>> add r0, r1, #16
>>> vmovl.s16 q8, d16
>>> vstmia r0, {d16, d17}
>>> bx lr
>>>
>>> Note that we're using a plain vldr instruction here to load the d
>> register, not a vld1 instru...
2012 Sep 07
0
[LLVMdev] Unaligned vector memory access for ARM/NEON.
...n general, where. I twiddled the IR to give it
> >>> higher
> >> alignment (16 bytes) and get:
> >>> extend: @ @extend
> >>> @ BB#0:
> >>> vldr d16, [r0]
> >>> vmovl.s16 q8, d16
> >>> vstmia r1, {d16, d17}
> >>> vldr d16, [r0, #8]
> >>> add r0, r1, #16
> >>> vmovl.s16 q8, d16
> >>> vstmia r0, {d16, d17}
> >>> bx lr
> >>>
> >>> Note that we're using a plain vldr instruction here to load the d
&g...
2012 May 10
0
[LLVMdev] MC Hammer Test results
...ng encode, decode, assemble and disassemble. I think it should store the
msbit and lsbit fields as operands and compute the mask at the instruction
selection phase.
[bug 5] echo 0x03 0x0b 0x80 0xec | ./llvm-mc -triple armv7 --show-inst
--show-encoding --disassemble
The bitpattern is decoding as VSTMIA r0, {d0} when it should decode to FSTMIAX
r0, {d0}
These instructions are a bit of a curiosity in that they are pre-ARMv6 (VFPv1)
instruction mnemonics which were not superseded by UAL-style V* mnemonics. They
still exist in VFPv4 but their use is deprecated. Any VSTM's with odd numbered
im...
2013 Oct 15
0
[LLVMdev] MI scheduler produce badly code with inline function
On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote:
> Hi all,
> I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
>
> The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.
A bug for this is welcome. Pretty soon, I’ll
2013 Oct 14
2
[LLVMdev] MI scheduler produce badly code with inline function
Hi all,
I meet this problem when compiling the TREAM benchmark (
http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched
The small function will be scheduled as good code, but if opt inline this
function, the inline part will be scheduled as bad code.
so I rewrite a simple code as attached link (foo.c), and compiled with two
different methods:
*method A:*
*$clang -O3 foo.c -static -S
2013 Oct 16
3
[LLVMdev] MI scheduler produce badly code with inline function
...000e+00 movt r2, :upper16:b movt
r3, #244 .LBB0_1: add r0, r12, r1 * vldr d17, [r0]* * vldr **d18**, [r0, #8]
* vmul.f64 d17, d17, d16 * vldr **d19**, [r0, #16]* * vldr **d20**, [r0,
#24]* add r0, r2, r1 vmul.f64 d18, d18, d16 add r1, r1, #32 cmp r1, r3
vmul.f64 d19, d19, d16 vmul.f64 d20, d20, d16 vstmia r0, {d17, d18, d19,
d20} bne .LBB0_1 bx lr
this is just because A9's per-operand machine model is not implemented
well?
By the way, why do you want to use the new machine model for mi-sched?
Thanks,
Kind regards
Kuan-Hsu
2013/10/15 Andrew Trick <atrick at apple.com>
>
> On Oc...