thr3ads.net - search: "lcpi0

Displaying 5 results from an estimated 5 matches for "lcpi0_2".

Did you mean: lcpi0_0

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

...at these locations so zmm22 will contain values not >> indexes. suppose [8]={1}, [9]={5}, [10]={4}...... so zmm22 will become >> zmm22={1, 5, 4, 3, 8, 7, 6, 2}......these are those 64 bit values loaded >> from memory indexes. >> >> vpbroadcastq zmm2, qword ptr [rip + .LCPI0_2]; here .LCPI0_2=4000 means >> broadcast value at this index for eg this location contains 2 so >> zmm2={2,2,2,2.....2}. >> >> vpmuludq zmm14, zmm10, zmm2 ; this step is value multiplication not >> index, there seems no point in multiplying these values here since we &g...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...02020 <x> 4005c8: psrld $0x17,%xmm0 4005cd: paddd 0x12b(%rip),%xmm0 # 400700 <.LCPI0_0> 4005d5: cvtdq2ps %xmm0,%xmm1 4005d8: divps 0x131(%rip),%xmm1 # 400710 <.LCPI0_1> 4005df: cvttps2dq %xmm1,%xmm1 4005e3: pmullw 0x135(%rip),%xmm1 # 400720 <.LCPI0_2> 4005eb: psubd %xmm1,%xmm0 4005ef: movq %xmm0,%rax 4005f4: movslq %eax,%rcx 4005f7: sar $0x20,%rax 4005fb: punpckhqdq %xmm0,%xmm0 4005ff: movq %xmm0,%rdx 400604: movslq %edx,%rsi 400607: sar $0x20,%rdx 40060b: movss 0x400740(,%rax,4),%xmm0 400614: movss 0x400740(...

[LLVMdev] Contants generation

2013 Jun 25

[LLVMdev] Contants generation

Hi again, Actually, I've just been looking at the existing code and the ARM solution may be over-complicated for this situation. You should be able to override EmitConstantPool directly, or possibly even just override getSectionForConstantKind in X86LinuxTargetObjectFile (and perhaps others) to return .text. Tim.

[LLVMdev] Contants generation

2013 Jun 25

[LLVMdev] Contants generation

That what I actually did now, locally in the code. But I still see the " movabsq" .text .align 8, 0x90 .LCPI0_0: .quad 4606281698874543309 # double 0.9 .LCPI0_1: .quad 4631147119616759172 # double 42.2794408 .LCPI0_2: .long 1065353216 # float 1 .zero 4 ... movabsq $.LCPI0_1, %rax # encoding: [0x48,0xb8,A,A,A,A,A,A,A,A] # fixup A - offset: 2, value: .LCPI0_1, kind: FK_Data_8 vbroadcastsd (%rax), %ymm0 # encoding: [0xc4,0xe2,0x7d,0x19,0x00] Ac...

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

2014 Mar 12

[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable

...ptimized code than GCC. For example: test.cpp: void init(void *); int g0[100]; int g1[100]; int g2[100]; void foo() { init(&g0); init(&g1); init(&g2); } Clang will emit 1 GOT entry for each GV and 2 instructions to get the address: ldr r0, .LCPI0_2 add r0, r0, r4 bl _Z4initPv(PLT) GCC does this only for the first GV. The rest GV address are computed directly: ldr r4, .L2 .LPIC0: add r4, pc, r4 è get &g0 via GOT_PC Relative mov r0, r4 bl _Z4initPv(PLT)...

search for: lcpi0_2