thr3ads.net - search: "vstmia"

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

1

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...;s LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 04...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

0

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...#39;s entirely possible that it's LLVM that's confused about the alignment requirements here. :) I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: extend: @ @extend @ BB#0: vldr d16, [r0] vmovl.s16 q8, d16 vstmia r1, {d16, d17} vldr d16, [r0, #8] add r0, r1, #16 vmovl.s16 q8, d16 vstmia r0, {d16, d17} bx lr Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C), you're correct about the...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 05

3

[LLVMdev] Unaligned vector memory access for ARM/NEON.

Hello Jim, Thank you for the response. I may be confused about the alignment rules here. I had been looking at the ARM RVCT Assembler Guide, which seems to indicate vld1.16 operates on 16-bit aligned data, unless I am misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71). Prior to the table, It does mention the accesses need to be "element" aligned, where I took

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

2

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...9;s LLVM that's confused about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 040...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

0

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...VM that's confused > about the alignment requirements here. :) > > I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get: > extend: @ @extend > @ BB#0: > vldr d16, [r0] > vmovl.s16 q8, d16 > vstmia r1, {d16, d17} > vldr d16, [r0, #8] > add r0, r1, #16 > vmovl.s16 q8, d16 > vstmia r0, {d16, d17} > bx lr > > Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 040...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

2

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...the alignment requirements here. :) >> >> I think I see, in general, where. I twiddled the IR to give it higher > alignment (16 bytes) and get: >> extend: @ @extend >> @ BB#0: >> vldr d16, [r0] >> vmovl.s16 q8, d16 >> vstmia r1, {d16, d17} >> vldr d16, [r0, #8] >> add r0, r1, #16 >> vmovl.s16 q8, d16 >> vstmia r0, {d16, d17} >> bx lr >> >> Note that we're using a plain vldr instruction here to load the d > register, not a vld1 instruction. Similarly for the stores....

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 06

0

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...the alignment requirements here. :) >> >> I think I see, in general, where. I twiddled the IR to give it higher > alignment (16 bytes) and get: >> extend: @ @extend >> @ BB#0: >> vldr d16, [r0] >> vmovl.s16 q8, d16 >> vstmia r1, {d16, d17} >> vldr d16, [r0, #8] >> add r0, r1, #16 >> vmovl.s16 q8, d16 >> vstmia r0, {d16, d17} >> bx lr >> >> Note that we're using a plain vldr instruction here to load the d > register, not a vld1 instruction. Similarly for the stores....

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

2

[LLVMdev] Unaligned vector memory access for ARM/NEON.

.... :) >>> >>> I think I see, in general, where. I twiddled the IR to give it higher >> alignment (16 bytes) and get: >>> extend: @ @extend >>> @ BB#0: >>> vldr d16, [r0] >>> vmovl.s16 q8, d16 >>> vstmia r1, {d16, d17} >>> vldr d16, [r0, #8] >>> add r0, r1, #16 >>> vmovl.s16 q8, d16 >>> vstmia r0, {d16, d17} >>> bx lr >>> >>> Note that we're using a plain vldr instruction here to load the d >> register, not a vld1 instru...

[LLVMdev] Unaligned vector memory access for ARM/NEON.

2012 Sep 07

0

[LLVMdev] Unaligned vector memory access for ARM/NEON.

...n general, where. I twiddled the IR to give it > >>> higher > >> alignment (16 bytes) and get: > >>> extend: @ @extend > >>> @ BB#0: > >>> vldr d16, [r0] > >>> vmovl.s16 q8, d16 > >>> vstmia r1, {d16, d17} > >>> vldr d16, [r0, #8] > >>> add r0, r1, #16 > >>> vmovl.s16 q8, d16 > >>> vstmia r0, {d16, d17} > >>> bx lr > >>> > >>> Note that we're using a plain vldr instruction here to load the d &g...

[LLVMdev] MC Hammer Test results

2012 May 10

0

[LLVMdev] MC Hammer Test results

...ng encode, decode, assemble and disassemble. I think it should store the msbit and lsbit fields as operands and compute the mask at the instruction selection phase. [bug 5] echo 0x03 0x0b 0x80 0xec | ./llvm-mc -triple armv7 --show-inst --show-encoding --disassemble The bitpattern is decoding as VSTMIA r0, {d0} when it should decode to FSTMIAX r0, {d0} These instructions are a bit of a curiosity in that they are pre-ARMv6 (VFPv1) instruction mnemonics which were not superseded by UAL-style V* mnemonics. They still exist in VFPv4 but their use is deprecated. Any VSTM's with odd numbered im...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

0

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 14

2

[LLVMdev] MI scheduler produce badly code with inline function

Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

3

[LLVMdev] MI scheduler produce badly code with inline function

...000e+00 movt r2, :upper16:b movt r3, #244 .LBB0_1: add r0, r12, r1 * vldr d17, [r0]* * vldr **d18**, [r0, #8] * vmul.f64 d17, d17, d16 * vldr **d19**, [r0, #16]* * vldr **d20**, [r0, #24]* add r0, r2, r1 vmul.f64 d18, d18, d16 add r1, r1, #32 cmp r1, r3 vmul.f64 d19, d19, d16 vmul.f64 d20, d20, d16 vstmia r0, {d17, d18, d19, d20} bne .LBB0_1 bx lr this is just because A9's per-operand machine model is not implemented well? By the way, why do you want to use the new machine model for mi-sched? Thanks, Kind regards Kuan-Hsu 2013/10/15 Andrew Trick <atrick at apple.com> > > On Oc...

search for: vstmia