thr3ads.net - search: "lcpi0

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

2014 Mar 14

3

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

>> Any thoughs? > > I'm now struggling to see how GCC justifies it. What if a different > translation-unit declared those variables in a different order? I also > can't get the same behaviour here, do you have a more complete > command-line? Ah, I see; the translation-unit that does the optimisation needs to have them as a definition (i.e. "= {0}") rather

[LLVMdev] Contants generation

2013 Jun 25

2

[LLVMdev] Contants generation

Hi again, Actually, I've just been looking at the existing code and the ARM solution may be over-complicated for this situation. You should be able to override EmitConstantPool directly, or possibly even just override getSectionForConstantKind in X86LinuxTargetObjectFile (and perhaps others) to return .text. Tim.

[LLVMdev] Contants generation

2013 Jun 25

0

[LLVMdev] Contants generation

That what I actually did now, locally in the code. But I still see the " movabsq" .text .align 8, 0x90 .LCPI0_0: .quad 4606281698874543309 # double 0.9 .LCPI0_1: .quad 4631147119616759172 # double 42.2794408 .LCPI0_2: .long 1065353216 # float 1 .zero 4 ... movabsq $.LCPI0_1, %rax # encoding: [0x48,0xb8,A,A,A,A,A,A,A,A] # fixup A - offset: 2, value: .LCPI0_1, kind: FK_Data_8 vbroadcasts...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...w-raw-insn ./a.out ... 00000000004005c0 <main>: 4005c0: movdqa 0x1a58(%rip),%xmm0 # 402020 <x> 4005c8: psrld $0x17,%xmm0 4005cd: paddd 0x12b(%rip),%xmm0 # 400700 <.LCPI0_0> 4005d5: cvtdq2ps %xmm0,%xmm1 4005d8: divps 0x131(%rip),%xmm1 # 400710 <.LCPI0_1> 4005df: cvttps2dq %xmm1,%xmm1 4005e3: pmullw 0x135(%rip),%xmm1 # 400720 <.LCPI0_2> 4005eb: psubd %xmm1,%xmm0 4005ef: movq %xmm0,%rax 4005f4: movslq %eax,%rcx 4005f7: sar $0x20,%rax 4005fb: punpckhqdq %xmm0,%xmm0 4005ff: movq %xmm0,%rdx 400604: movslq %edx,...

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

2014 Mar 14

2

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

...), %eax movl %eax, (%esp) calll bar at PLT Which is ok , since the add of ebx is folded and the constant is an immediate in x86. On ARM, that is not the case. We produce ldr r0, .LCPI0_0 add r4, pc, r0 // r4 is the equivalent of ebx in the x86 case. ldr r0, .LCPI0_1 // r0 is the constant that is an immediate in x86. add r0, r0, r4 // that is the add that is folded in x86 ... .LCPI0_0: .long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8) .LCPI0_1: .long g0(GOTOFF) For ARM, codegen already keeps tracks of offset so it can implement the consta...

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

2014 Oct 07

4

[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)

...es -0.5 is equal to -0.0, so correct exit code is 1. llvm-3.4.2 on x86 linux target produced the following assembly: .file "fpfail.ll" .section .rodata.cst8,"aM", at progbits,8 .align 8 .LCPI0_0: .quad -4620693217682128896 # double -0.5 .LCPI0_1: .quad -9223372036854775808 # double -0 .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: vmovsd g, %xmm0 vmulsd .LCPI0_0, %xmm0, %xmm0...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...;>>> .quad 11 # 0xb >>>>> .quad 12 # 0xc >>>>> .quad 13 # 0xd >>>>> .quad 14 # 0xe >>>>> .quad 15 # 0xf >>>>> .LCPI0_1: >>>>> .quad 0 # 0x0 >>>>> .quad 1 # 0x1 >>>>> .quad 2 # 0x2 >>>>> .quad 3 # 0x3 >>>>> .quad 4 # 0x4 >>>&...

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

4

[LLVMdev] Poor register allocation (constants causing spilling)

...gh the constant is clearly rematerializable, the allocator has gone to some length to keep it in a register, and it has spilled a value to the stack. It would have been cheaper to simply fold the constant load into the 3 uses. This is not the only example. Later on we can see this: vmovaps .LCPI0_1(%rip), %xmm6 # xmm6 = [2147483648,2147483648,...] vxorps %xmm6, %xmm2, %xmm3 ... vandps %xmm6, %xmm5, %xmm2 ... vmovaps %xmm1, -56(%rsp) # 16-byte Spill vmovaps %xmm6, %xmm1 ... vmovaps -56(%rsp), %xmm0 # 16-byte Reload ... vxorps %xmm1, %x...

search for: lcpi0_1