thr3ads.net - search: "lcpi0

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Apr 15

2

[ARM] Register pressure with -mthumb forces register reload before each call

...mov r4, r0 > bl "<90>w\n " > mov r1, r2 > mov r2, r5 > bl "<90>w\n " > mov r0, r5 > mov r1, r4 > mov r2, r6 > ldr r6, .LCPI0_0 > blx r6 > mov r0, r5 > mov r1, r5 > mov r2, r4 > blx r6 > > regalloc dump (attached) shows: > Inline spilling tGPR:%9 [80r,152r:0) 0 at 80r weight:3.209746e-03 > From original %3 > also spill snippet...

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

2014 Mar 14

3

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

>> Any thoughs? > > I'm now struggling to see how GCC justifies it. What if a different > translation-unit declared those variables in a different order? I also > can't get the same behaviour here, do you have a more complete > command-line? Ah, I see; the translation-unit that does the optimisation needs to have them as a definition (i.e. "= {0}") rather

[LLVMdev] ARM eabi calling convention

2012 Aug 07

2

[LLVMdev] ARM eabi calling convention

...[2 x i32] %0) nounwindt * It doesn't seem that ARM backend can figure out that "[2 x i32] %0" was originally a structure consisting of a single double field. When I run llc, it looks like "%0" is being passed in register r1 and r2. *$ llc vararg1-main.ll -o - ldr r0, .LCPI0_0 ldm r0, {r1, r2} .LCPI0_0: .long .Lmain.s0 ... .Lmain.s0: .long 0 @ double 2.000000e+00 * I am running tests to see if llc targeting mips can correctly compile a bitcode file generated by clang-arm. One of the tests is failing, and I was wondering whe...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for arr...

[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?

2011 Mar 18

2

[LLVMdev] LLVM ERROR: No such instruction: `vmovsd ...' ?

...00 define i32 @main() { entry: %0 = load double* @g %1 = fmul double 1.000000e+06, %0 store double %1, double* @g ret i32 0 } in test.ll and I run > llc test.ll > gcc test.s I get: test.s:12:no such instruction: `vmovsd _g(%rip), %xmm0' test.s:13:no such instruction: `vmulsd LCPI0_0(%rip), %xmm0,%xmm0' test.s:14:no such instruction: `vmovsd %xmm0, _g(%rip)' I'm completely puzzled. Help? Thanks! N

[LLVMdev] ARM eabi calling convention

2012 Aug 07

0

[LLVMdev] ARM eabi calling convention

...It doesn't seem that ARM backend can figure out that "[2 x i32] %0" was > originally a structure consisting of a single double field. When I run llc, > it looks like "%0" is being passed in register r1 and r2. > > $ llc vararg1-main.ll -o - > > ldr r0, .LCPI0_0 > ldm r0, {r1, r2} > > .LCPI0_0: > .long .Lmain.s0 > ... > .Lmain.s0: > .long 0 @ double 2.000000e+00 > > > I am running tests to see if llc targeting mips can correctly compile a > bitcode file generated by clang-arm. &...

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

2014 Oct 07

4

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

...And I expected that minimal positive denormalized double times -0.5 is equal to -0.0, so correct exit code is 1. llvm-3.4.2 on x86 linux target produced the following assembly: .file "fpfail.ll" .section .rodata.cst8,"aM", at progbits,8 .align 8 .LCPI0_0: .quad -4620693217682128896 # double -0.5 .LCPI0_1: .quad -9223372036854775808 # double -0 .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: vmov...

[LLVMdev] Contants generation - proposal

2013 Jun 25

2

[LLVMdev] Contants generation - proposal

...Problem description: In X86_64 target the Code Model is "Large". It means that address is 64-bit and IP-relative memory operand can't be used in this case. (Because in IP-relative memory operand the displacement is 32-bit). In order to load constant, we use 2 instructions. movabsq $.LCPI0_0, %rcx vmulpd (%rcx), %ymm0, %ymm0 It happens because .LCPI0_0 is in .rodata section and instruction itself is in .text. If I put the constant in .text, the code will look much better: vmulpd .LCPI0_0(%rip), %ymm0, %ymm0 (2) Proposal Define one more Code Model, let's say "LargeNearCons...

[LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu

2011 Oct 17

0

[LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu

...llvm/Debug+Asserts/bin/llc < /home/jabbey/src/osuosl/buildbot/sandbox/llvm-x86_64-ubuntu/llvm-x86_64-ubuntu/llvm/test/CodeGen/X86/mmx-pinsrw.ll -mtriple=x86_64-linux -mattr=+mmx,+sse2 produces: .file "<stdin>" .section .rodata.cst16,"aM", at progbits,16 .align 16 .LCPI0_0: .byte 0 # 0x0 .byte 1 # 0x1 .byte 4 # 0x4 .byte 5 # 0x5 .byte 8 # 0x8 .byte 9 # 0x9 .byte 12 # 0xc .byte 13 # 0xd ....

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...ast-math -march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */ $ objdump -dC --no-show-raw-insn ./a.out ... 00000000004005c0 <main>: 4005c0: movdqa 0x1a58(%rip),%xmm0 # 402020 <x> 4005c8: psrld $0x17,%xmm0 4005cd: paddd 0x12b(%rip),%xmm0 # 400700 <.LCPI0_0> 4005d5: cvtdq2ps %xmm0,%xmm1 4005d8: divps 0x131(%rip),%xmm1 # 400710 <.LCPI0_1> 4005df: cvttps2dq %xmm1,%xmm1 4005e3: pmullw 0x135(%rip),%xmm1 # 400720 <.LCPI0_2> 4005eb: psubd %xmm1,%xmm0 4005ef: movq %xmm0,%rax 4005f4: movslq %eax,%rcx 4005f7: sa...

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Mar 31

2

[ARM] Register pressure with -mthumb forces register reload before each call

Hi, Compiling attached test-case, which is reduced version of of uECC_shared_secret from tinycrypt library [1], with --target=arm-linux-gnueabi -march=armv6-m -Oz -S results in reloading of register holding function's address before every call to blx: ldr r3, .LCPI0_0 blx r3 mov r0, r6 mov r1, r5 mov r2, r4 ldr r3, .LCPI0_0 blx r3 ldr r3, .LCPI0_0 mov r0, r6 mov r1, r5 mov r2, r4 blx r3 .LCPI0_0: .long foo >Fro...

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

2012 Jan 24

0

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

On Tue, Jan 24, 2012 at 08:36:17AM -0800, Esperanza de Escobar wrote: > No one is arguing that there aren't ABI specs or LLVM design > guidelines that say that unaligned accesses "should not", "could not" > or "aren't guaranteed to" work, because it's besides the point. No, it is the core of the issue. Standard C gives the compiler certain

visitShiftByConstant of DAGCombiner

2016 Dec 08

2

visitShiftByConstant of DAGCombiner

...of it: For code as below: unsigned array[4]; unsigned foo(unsigned long x) { return array[(x>>2)&3ul]; sequence before this canonicalisation (ARM): foo: .fnstart @ BB#0: @ %entry lsrs r0, r0, #2 movs r1, #3 ands r1, r0 lsls r0, r1, #2 ldr r1, .LCPI0_0 ldr r0, [r1, r0] bx lr .p2align 2 sequence after this canonicalisation: foo: .fnstart @ BB#0: @ %entry movs r1, #12 ands r1, r0 ldr r0, .LCPI0_0 ldr r0, [r0, r1] bx lr .p2align 2 This canonicalisation makes shift folding possible. But I wonder if only shift contex...

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Apr 15

4

[ARM] Register pressure with -mthumb forces register reload before each call

...mov r6, r2 mov r5, r1 mov r4, r0 bl "<90>w\n " mov r1, r2 mov r2, r5 bl "<90>w\n " mov r0, r5 mov r1, r4 mov r2, r6 ldr r6, .LCPI0_0 blx r6 mov r0, r5 mov r1, r5 mov r2, r4 blx r6 regalloc dump (attached) shows: Inline spilling tGPR:%9 [80r,152r:0) 0 at 80r weight:3.209746e-03 >From original %3 also spill snippet %8 [152r,232r:0) 0 at 152r weight:2.104167e...

[ARM] Register pressure with -mthumb forces register reload before each call

2020 Apr 07

2

[ARM] Register pressure with -mthumb forces register reload before each call

...ling attached test-case, which is reduced version of of > uECC_shared_secret from tinycrypt library [1], with > --target=arm-linux-gnueabi -march=armv6-m -Oz -S > results in reloading of register holding function's address before > every call to blx: > > ldr r3, .LCPI0_0 > blx r3 > mov r0, r6 > mov r1, r5 > mov r2, r4 > ldr r3, .LCPI0_0 > blx r3 > ldr r3, .LCPI0_0 > mov r0, r6 > mov r1, r5 > mov r2, r4 >...

[LLVMdev] ARM eabi calling convention

2012 Aug 07

0

[LLVMdev] ARM eabi calling convention

On Aug 6, 2012, at 3:21 PM, Akira Hatanaka <ahatanak at gmail.com> wrote: > When I compile this program > > $ cat vararg1-main.c > > typedef struct { > double d; > } S0; > > S0 g1; > > void foo0(int a, ...); > > int main(int argc, char **argv) { > S0 s0 = { 2.0 }; > > foo0(1, s0); > > printf("%f\n", g1.d); >

[LLVMdev] ARM Intruction Constraint DestReg!=SrcReg patch?

2010 Nov 25

2

[LLVMdev] ARM Intruction Constraint DestReg!=SrcReg patch?

...------------- next part -------------- .syntax unified .cpu arm10tdmi .eabi_attribute 10, 2 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .file "foo.c" .text .globl foo .align 2 .type foo,%function foo: ldr r1, .LCPI0_0 ldr r0, [r1] mov r2, #123 mul r0, r0, r2 mov r2, #15 orr r2, r2, #15, 24 add r0, r0, #114, 30 and r0, r0, r2 str r0, [r1] bx lr .align 2 .LCPI0_0: .long bar .Ltmp0: .size foo, .Ltmp0-foo .type bar,%object .comm bar,4,4 .ident "GCC: (GNU) 4.2.1 (Based on Apple Inc. build 5658)...

[LLVMdev] ARM eabi calling convention

2012 Aug 07

2

[LLVMdev] ARM eabi calling convention

...RM backend can figure out that "[2 x i32] %0" was >> originally a structure consisting of a single double field. When I run llc, >> it looks like "%0" is being passed in register r1 and r2. >> >> $ llc vararg1-main.ll -o - >> >> ldr r0, .LCPI0_0 >> ldm r0, {r1, r2} >> >> .LCPI0_0: >> .long .Lmain.s0 >> ... >> .Lmain.s0: >> .long 0 @ double 2.000000e+00 >> >> >> I am running tests to see if llc targeting mips can correctly compile a &g...

[LLVMdev] ARM eabi calling convention

2012 Aug 06

2

[LLVMdev] ARM eabi calling convention

When I compile this program *$ cat vararg1-main.c typedef struct { double d; } S0; S0 g1; void foo0(int a, ...); int main(int argc, char **argv) { S0 s0 = { 2.0 }; foo0(1, s0); printf("%f\n", g1.d); * * return 0; }* with this command, *$ clang -target arm-none-linux-gnueabi-gcc -ccc-clang-archs armv7 -emit-llvm vararg1-main.c -S -o vararg1-main.ll -O3* I get this

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

2014 Mar 14

2

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

...tmp3-.L0$pb), %ebx leal g0 at GOTOFF(%ebx), %eax movl %eax, (%esp) calll bar at PLT leal g1 at GOTOFF(%ebx), %eax movl %eax, (%esp) calll bar at PLT Which is ok , since the add of ebx is folded and the constant is an immediate in x86. On ARM, that is not the case. We produce ldr r0, .LCPI0_0 add r4, pc, r0 // r4 is the equivalent of ebx in the x86 case. ldr r0, .LCPI0_1 // r0 is the constant that is an immediate in x86. add r0, r0, r4 // that is the add that is folded in x86 ... .LCPI0_0: .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) .LCPI0_1:...

search for: lcpi0_0