Displaying 20 results from an estimated 23 matches for "lcpi1_0".
Did you mean:
lcpi0_0
2012 Jul 19
2
[LLVMdev] Help with PPC64 JIT
...g the JIT function code is
an ODP.
Now I'm trying to make the relocation work properly. Using the testcase '2003-01-04-ArgumentBug'
the assembly generated for main functions is the following:
.L.main:
# BB#0:
mflr 0
std 0, 16(1)
stdu 1, -112(1)
lis 3, .LCPI1_0 at ha
li 4, 1
lfs 1, .LCPI1_0 at l(3)
li 3, 0
bl foo
nop
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr
Which is correct, however for the JIT one generated in memory the relocations generate some issues.
First the 'lis 3, .LCP...
2012 Jul 20
0
[LLVMdev] Help with PPC64 JIT
...#39;m trying to make the relocation work properly. Using the testcase '2003-01-04-ArgumentBug'
> the assembly generated for main functions is the following:
>
> .L.main:
> # BB#0:
> mflr 0
> std 0, 16(1)
> stdu 1, -112(1)
> lis 3, .LCPI1_0 at ha
> li 4, 1
> lfs 1, .LCPI1_0 at l(3)
> li 3, 0
> bl foo
> nop
> addi 1, 1, 112
> ld 0, 16(1)
> mtlr 0
> blr
>
> Which is correct, however for the JIT one generated in memory the re...
2007 Sep 24
2
[LLVMdev] RFC: Tail call optimization X86
...grep 8 to ..| grep 9 since
> +; with new fastcc has std call semantics causing a stack adjustment
> +; after the function call
>
> Not sure if I understand this. Can you illustrate with an example?
Sure
The code generated used to be
_array:
subl $12, %esp
movss LCPI1_0, %xmm0
mulss 16(%esp), %xmm0
movss %xmm0, (%esp)
call L_qux$stub
mulss LCPI1_0, %xmm0
addl $12, %esp
ret
FastCC use to be caller pops arguments so there was no stack
adjustment after the
call to qux. Now FastCC has callee pops arg...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
...w fastcc has std call semantics causing a stack adjustment
>> +; after the function call
>>
>> Not sure if I understand this. Can you illustrate with an example?
>
> Sure
>
> The code generated used to be
> _array:
> subl $12, %esp
> movss LCPI1_0, %xmm0
> mulss 16(%esp), %xmm0
> movss %xmm0, (%esp)
> call L_qux$stub
> mulss LCPI1_0, %xmm0
> addl $12, %esp
> ret
>
> FastCC use to be caller pops arguments so there was no stack
> adjustment after the
> ca...
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...gt; jb L3
> subss LC0(%rip), %xmm0
> movabsq $-9223372036854775808, %rax
> cvttss2siq %xmm0, %rdx
> xorq %rax, %rdx
> L3:
> movq %rdx, %rax
> ret
>
> instead of
>
> _conv:
> movss LCPI1_0(%rip), %xmm1
> cvttss2siq %xmm0, %rcx
> movaps %xmm0, %xmm2
> subss %xmm1, %xmm2
> cvttss2siq %xmm2, %rax
> movabsq $-9223372036854775808, %rdx
> xorq %rdx, %rax
> ucomiss %xmm1, %xmm0
> cmovb %rcx, %rax
>...
2007 Sep 25
2
[LLVMdev] RFC: Tail call optimization X86
...there was no stack
> > adjustment after the
> > call to qux. Now FastCC has callee pops arguments on return semantics
> > so the
> > x86 backend inserts a stack adjustment after the call.
> >
> > _array:
> > subl $12, %esp
> > movss LCPI1_0, %xmm0
> > mulss 16(%esp), %xmm0
> > movss %xmm0, (%esp)
> > call L_qux$stub
> > subl $4, %esp << stack adjustment because qux pops
> > arguments on return
> > mulss LCPI1_0, %xmm0
> > addl...
2014 Mar 26
3
[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double
...Interval s;
> s.nlo = x * -x;
> s.hi = x * x;
> return s;
> }
>
> Both multiplies are necessary, since they round in different
> directions. However, clang does not know this, assumes that x * -x =
> -(x * x), and simplifies down to a single multiply:
>
> .LCPI1_0:
> .quad -9223372036854775808 # double -0.000000e+00
> .quad -9223372036854775808 # double -0.000000e+00
> .text
> .globl _Z21inspect_singleton_sqrd
> .align 16, 0x90
> .type _Z21inspect_singleton_sqrd, at function
> _Z21inspect_singleton_sqrd:...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
Hi Arnold,
This is a very good first step! Thanks! Comments below.
Evan
Index: test/CodeGen/X86/constant-pool-remat-0.ll
===================================================================
--- test/CodeGen/X86/constant-pool-remat-0.ll (revision 42247)
+++ test/CodeGen/X86/constant-pool-remat-0.ll (working copy)
@@ -1,8 +1,10 @@
; RUN: llvm-as < %s | llc -march=x86-64 | grep LCPI | count 3
;
2007 Sep 23
2
[LLVMdev] RFC: Tail call optimization X86
The patch is against revision 42247.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tailcall-src.patch
Type: application/octet-stream
Size: 62639 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4770302f/attachment.obj>
2010 Mar 10
1
[LLVMdev] Vectors of length 3 as first-class types
...xpecting it to load the three first lanes of %xmm0. (If returning a
vector of four, %xmm0 is used). But the generated assembly seems to be
using the method of return by hidden pointer. This despite that the
generated assembly seems to have allocated the vector with padding
preparing for this:
.LCPI1_0: # constant pool <4 x i32>
.long 1 # 0x1
.long 2 # 0x2
.long 3 # 0x3
.zero 4
.text
Should this data not be loaded as is into %xmm0 like in the case of
vector of four?:
retFour:...
2012 Jan 24
0
[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)
On Tue, Jan 24, 2012 at 08:36:17AM -0800, Esperanza de Escobar wrote:
> No one is arguing that there aren't ABI specs or LLVM design
> guidelines that say that unaligned accesses "should not", "could not"
> or "aren't guaranteed to" work, because it's besides the point.
No, it is the core of the issue. Standard C gives the compiler certain
2007 Sep 25
0
[LLVMdev] RFC: Tail call optimization X86
...ent after the
>>> call to qux. Now FastCC has callee pops arguments on return
>>> semantics
>>> so the
>>> x86 backend inserts a stack adjustment after the call.
>>>
>>> _array:
>>> subl $12, %esp
>>> movss LCPI1_0, %xmm0
>>> mulss 16(%esp), %xmm0
>>> movss %xmm0, (%esp)
>>> call L_qux$stub
>>> subl $4, %esp << stack adjustment because qux pops
>>> arguments on return
>>> mulss LCPI1_0, %xmm0
>...
2017 Apr 20
4
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...ucomiss LC0(%rip), %xmm0
cvttss2siq %xmm0, %rdx
jb L3
subss LC0(%rip), %xmm0
movabsq $-9223372036854775808, %rax
cvttss2siq %xmm0, %rdx
xorq %rax, %rdx
L3:
movq %rdx, %rax
ret
instead of
_conv:
movss LCPI1_0(%rip), %xmm1
cvttss2siq %xmm0, %rcx
movaps %xmm0, %xmm2
subss %xmm1, %xmm2
cvttss2siq %xmm2, %rax
movabsq $-9223372036854775808, %rdx
xorq %rdx, %rax
ucomiss %xmm1, %xmm0
cmovb %rcx, %rax
ret
On 19 Apr 2017, at 2:10 PM, Michae...
2017 Feb 18
2
Vector trunc code generation difference between llvm-3.9 and 4.0
...he backend and
> produces worse code even for x86 with AVX2:
> before:
> vmovd %edi, %xmm1
> vpmovzxwq %xmm1, %xmm1
> vpsraw %xmm1, %xmm0, %xmm0
> retq
>
> after:
> vmovd %edi, %xmm1
> vpbroadcastd %xmm1, %ymm1
> vmovdqa LCPI1_0(%rip), %ymm2
> vpshufb %ymm2, %ymm1, %ymm1
> vpermq $232, %ymm1, %ymm1
> vpmovzxwd %xmm1, %ymm1
> vpmovsxwd %xmm0, %ymm0
> vpsravd %ymm1, %ymm0, %ymm0
> vpshufb %ymm2, %ymm0, %ymm0
> vpermq $232, %ymm0, %ymm0
> vzeroupper...
2012 Jul 20
3
[LLVMdev] Help with PPC64 JIT
...properly. Using the testcase '2003-01-04-ArgumentBug'
> > the assembly generated for main functions is the following:
> >
> > .L.main:
> > # BB#0:
> > mflr 0
> > std 0, 16(1)
> > stdu 1, -112(1)
> > lis 3, .LCPI1_0 at ha
> > li 4, 1
> > lfs 1, .LCPI1_0 at l(3)
> > li 3, 0
> > bl foo
> > nop
> > addi 1, 1, 112
> > ld 0, 16(1)
> > mtlr 0
> > blr
> >
> > Which is corr...
2017 Feb 17
2
Vector trunc code generation difference between llvm-3.9 and 4.0
Correction in the C snippet:
typedef signed short v8i16_t __attribute__((ext_vector_type(8)));
v8i16_t foo (v8i16_t a, int n)
{
return a >> n;
}
Best regards
Saurabh
On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com>
wrote:
> Hello,
>
> We are investigating a difference in code generation for vector splat
> instructions between llvm-3.9
2017 Mar 08
2
Vector trunc code generation difference between llvm-3.9 and 4.0
...ore:
>>> vmovd %edi, %xmm1
>>> vpmovzxwq %xmm1, %xmm1
>>> vpsraw %xmm1, %xmm0, %xmm0
>>> retq
>>>
>>> after:
>>> vmovd %edi, %xmm1
>>> vpbroadcastd %xmm1, %ymm1
>>> vmovdqa LCPI1_0(%rip), %ymm2
>>> vpshufb %ymm2, %ymm1, %ymm1
>>> vpermq $232, %ymm1, %ymm1
>>> vpmovzxwd %xmm1, %ymm1
>>> vpmovsxwd %xmm0, %ymm0
>>> vpsravd %ymm1, %ymm0, %ymm0
>>> vpshufb %ymm2, %ymm0, %ymm0
>>&g...
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
...testl %eax, %eax
je .LBB1_3
# BB#1:
xorl %ebx, %ebx
movl 108(%esp), %ecx
movl 104(%esp), %edx
xorl %esi, %esi
.align 16, 0x90
.LBB1_2: # %.lr.ph.i
# =>This Inner Loop Header: Depth=1
movsd (%edx,%ebx,8), %xmm2
addsd .LCPI1_0, %xmm2
movsd 16(%edx,%ebx,8), %xmm1
movsd %xmm1, (%esp) # 8-byte Spill
movl %ebx, %edi
addl $1, %edi
addsd (%edx,%edi,8), %xmm2
movsd 136(%edx,%ebx,8), %xmm1
movsd %xmm1, 72(%esp) # 8-byte Spill
movsd 128(%edx,%ebx,8), %xmm1
movsd %xmm1, 64(%esp) # 8-byte Spill
m...
2012 Jan 24
4
[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)
No one is arguing that there aren't ABI specs or LLVM design
guidelines that say that unaligned accesses "should not", "could not"
or "aren't guaranteed to" work, because it's besides the point.
The point is that unaligned 32-bit loads and stores *work in practice*
on every single ARM device Apple has ever manufactured. I'm not a
hardware person, but
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
...testl %eax, %eax
je .LBB1_3
# BB#1:
xorl %ebx, %ebx
movl 108(%esp), %ecx
movl 104(%esp), %edx
xorl %esi, %esi
.align 16, 0x90
.LBB1_2: # %.lr.ph.i
# =>This Inner Loop Header: Depth=1
movsd (%edx,%ebx,8), %xmm2
addsd .LCPI1_0, %xmm2
movsd 16(%edx,%ebx,8), %xmm1
movsd %xmm1, (%esp) # 8-byte Spill
movl %ebx, %edi
addl $1, %edi
addsd (%edx,%edi,8), %xmm2
movsd 136(%edx,%ebx,8), %xmm1
movsd %xmm1, 72(%esp) # 8-byte Spill
movsd 128(%edx,%ebx,8), %xmm1
movsd %xmm1, 64(%esp) # 8-byte Spill
m...