Displaying 5 results from an estimated 5 matches for "lcpi0_2".
Did you mean:
lcpi0_0
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...at these locations so zmm22 will contain values not
>> indexes. suppose [8]={1}, [9]={5}, [10]={4}...... so zmm22 will become
>> zmm22={1, 5, 4, 3, 8, 7, 6, 2}......these are those 64 bit values loaded
>> from memory indexes.
>>
>> vpbroadcastq zmm2, qword ptr [rip + .LCPI0_2]; here .LCPI0_2=4000 means
>> broadcast value at this index for eg this location contains 2 so
>> zmm2={2,2,2,2.....2}.
>>
>> vpmuludq zmm14, zmm10, zmm2 ; this step is value multiplication not
>> index, there seems no point in multiplying these values here since we
&g...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...02020 <x>
4005c8: psrld $0x17,%xmm0
4005cd: paddd 0x12b(%rip),%xmm0 # 400700 <.LCPI0_0>
4005d5: cvtdq2ps %xmm0,%xmm1
4005d8: divps 0x131(%rip),%xmm1 # 400710 <.LCPI0_1>
4005df: cvttps2dq %xmm1,%xmm1
4005e3: pmullw 0x135(%rip),%xmm1 # 400720 <.LCPI0_2>
4005eb: psubd %xmm1,%xmm0
4005ef: movq %xmm0,%rax
4005f4: movslq %eax,%rcx
4005f7: sar $0x20,%rax
4005fb: punpckhqdq %xmm0,%xmm0
4005ff: movq %xmm0,%rdx
400604: movslq %edx,%rsi
400607: sar $0x20,%rdx
40060b: movss 0x400740(,%rax,4),%xmm0
400614: movss 0x400740(...
2013 Jun 25
2
[LLVMdev] Contants generation
Hi again,
Actually, I've just been looking at the existing code and the ARM
solution may be over-complicated for this situation.
You should be able to override EmitConstantPool directly, or possibly
even just override getSectionForConstantKind in
X86LinuxTargetObjectFile (and perhaps others) to return .text.
Tim.
2013 Jun 25
0
[LLVMdev] Contants generation
That what I actually did now, locally in the code.
But I still see the " movabsq"
.text
.align 8, 0x90
.LCPI0_0:
.quad 4606281698874543309 # double 0.9
.LCPI0_1:
.quad 4631147119616759172 # double 42.2794408
.LCPI0_2:
.long 1065353216 # float 1
.zero 4
...
movabsq $.LCPI0_1, %rax # encoding: [0x48,0xb8,A,A,A,A,A,A,A,A]
# fixup A - offset: 2, value: .LCPI0_1, kind: FK_Data_8
vbroadcastsd (%rax), %ymm0 # encoding: [0xc4,0xe2,0x7d,0x19,0x00]
Ac...
2014 Mar 12
3
[LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
...ptimized code than GCC.
For example:
test.cpp:
void init(void *);
int g0[100];
int g1[100];
int g2[100];
void foo() {
init(&g0);
init(&g1);
init(&g2);
}
Clang will emit 1 GOT entry for each GV and 2 instructions to get the
address:
ldr r0, .LCPI0_2
add r0, r0, r4
bl _Z4initPv(PLT)
GCC does this only for the first GV. The rest GV address are computed
directly:
ldr r4, .L2
.LPIC0:
add r4, pc, r4 รจ get &g0 via GOT_PC Relative
mov r0, r4
bl _Z4initPv(PLT)...