thr3ads.net - search: "lcpi1

2012 Jul 19

2

[LLVMdev] Help with PPC64 JIT

...g the JIT function code is an ODP. Now I'm trying to make the relocation work properly. Using the testcase '2003-01-04-ArgumentBug' the assembly generated for main functions is the following: .L.main: # BB#0: mflr 0 std 0, 16(1) stdu 1, -112(1) lis 3, .LCPI1_0 at ha li 4, 1 lfs 1, .LCPI1_0 at l(3) li 3, 0 bl foo nop addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr Which is correct, however for the JIT one generated in memory the relocations generate some issues. First the 'lis 3, .LCP...

[LLVMdev] Help with PPC64 JIT

2012 Jul 20

0

[LLVMdev] Help with PPC64 JIT

...#39;m trying to make the relocation work properly. Using the testcase '2003-01-04-ArgumentBug' > the assembly generated for main functions is the following: > > .L.main: > # BB#0: > mflr 0 > std 0, 16(1) > stdu 1, -112(1) > lis 3, .LCPI1_0 at ha > li 4, 1 > lfs 1, .LCPI1_0 at l(3) > li 3, 0 > bl foo > nop > addi 1, 1, 112 > ld 0, 16(1) > mtlr 0 > blr > > Which is correct, however for the JIT one generated in memory the re...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

2

[LLVMdev] RFC: Tail call optimization X86

...grep 8 to ..| grep 9 since > +; with new fastcc has std call semantics causing a stack adjustment > +; after the function call > > Not sure if I understand this. Can you illustrate with an example? Sure The code generated used to be _array: subl $12, %esp movss LCPI1_0, %xmm0 mulss 16(%esp), %xmm0 movss %xmm0, (%esp) call L_qux$stub mulss LCPI1_0, %xmm0 addl $12, %esp ret FastCC use to be caller pops arguments so there was no stack adjustment after the call to qux. Now FastCC has callee pops arg...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

0

[LLVMdev] RFC: Tail call optimization X86

...w fastcc has std call semantics causing a stack adjustment >> +; after the function call >> >> Not sure if I understand this. Can you illustrate with an example? > > Sure > > The code generated used to be > _array: > subl $12, %esp > movss LCPI1_0, %xmm0 > mulss 16(%esp), %xmm0 > movss %xmm0, (%esp) > call L_qux$stub > mulss LCPI1_0, %xmm0 > addl $12, %esp > ret > > FastCC use to be caller pops arguments so there was no stack > adjustment after the > ca...

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 19

3

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

...gt; jb L3 > subss LC0(%rip), %xmm0 > movabsq $-9223372036854775808, %rax > cvttss2siq %xmm0, %rdx > xorq %rax, %rdx > L3: > movq %rdx, %rax > ret > > instead of > > _conv: > movss LCPI1_0(%rip), %xmm1 > cvttss2siq %xmm0, %rcx > movaps %xmm0, %xmm2 > subss %xmm1, %xmm2 > cvttss2siq %xmm2, %rax > movabsq $-9223372036854775808, %rdx > xorq %rdx, %rax > ucomiss %xmm1, %xmm0 > cmovb %rcx, %rax >...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 25

2

[LLVMdev] RFC: Tail call optimization X86

...there was no stack > > adjustment after the > > call to qux. Now FastCC has callee pops arguments on return semantics > > so the > > x86 backend inserts a stack adjustment after the call. > > > > _array: > > subl $12, %esp > > movss LCPI1_0, %xmm0 > > mulss 16(%esp), %xmm0 > > movss %xmm0, (%esp) > > call L_qux$stub > > subl $4, %esp << stack adjustment because qux pops > > arguments on return > > mulss LCPI1_0, %xmm0 > > addl...

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

2014 Mar 26

3

[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

...Interval s; > s.nlo = x * -x; > s.hi = x * x; > return s; > } > > Both multiplies are necessary, since they round in different > directions. However, clang does not know this, assumes that x * -x = > -(x * x), and simplifies down to a single multiply: > > .LCPI1_0: > .quad -9223372036854775808 # double -0.000000e+00 > .quad -9223372036854775808 # double -0.000000e+00 > .text > .globl _Z21inspect_singleton_sqrd > .align 16, 0x90 > .type _Z21inspect_singleton_sqrd, at function > _Z21inspect_singleton_sqrd:...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

0

[LLVMdev] RFC: Tail call optimization X86

Hi Arnold, This is a very good first step! Thanks! Comments below. Evan Index: test/CodeGen/X86/constant-pool-remat-0.ll =================================================================== --- test/CodeGen/X86/constant-pool-remat-0.ll (revision 42247) +++ test/CodeGen/X86/constant-pool-remat-0.ll (working copy) @@ -1,8 +1,10 @@ ; RUN: llvm-as < %s | llc -march=x86-64 | grep LCPI | count 3 ;

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 23

2

[LLVMdev] RFC: Tail call optimization X86

The patch is against revision 42247. -------------- next part -------------- A non-text attachment was scrubbed... Name: tailcall-src.patch Type: application/octet-stream Size: 62639 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4770302f/attachment.obj>

[LLVMdev] Vectors of length 3 as first-class types

2010 Mar 10

1

[LLVMdev] Vectors of length 3 as first-class types

...xpecting it to load the three first lanes of %xmm0. (If returning a vector of four, %xmm0 is used). But the generated assembly seems to be using the method of return by hidden pointer. This despite that the generated assembly seems to have allocated the vector with padding preparing for this: .LCPI1_0: # constant pool <4 x i32> .long 1 # 0x1 .long 2 # 0x2 .long 3 # 0x3 .zero 4 .text Should this data not be loaded as is into %xmm0 like in the case of vector of four?: retFour:...

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

2012 Jan 24

0

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

On Tue, Jan 24, 2012 at 08:36:17AM -0800, Esperanza de Escobar wrote: > No one is arguing that there aren't ABI specs or LLVM design > guidelines that say that unaligned accesses "should not", "could not" > or "aren't guaranteed to" work, because it's besides the point. No, it is the core of the issue. Standard C gives the compiler certain

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 25

0

[LLVMdev] RFC: Tail call optimization X86

...ent after the >>> call to qux. Now FastCC has callee pops arguments on return >>> semantics >>> so the >>> x86 backend inserts a stack adjustment after the call. >>> >>> _array: >>> subl $12, %esp >>> movss LCPI1_0, %xmm0 >>> mulss 16(%esp), %xmm0 >>> movss %xmm0, (%esp) >>> call L_qux$stub >>> subl $4, %esp << stack adjustment because qux pops >>> arguments on return >>> mulss LCPI1_0, %xmm0 >...

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 20

4

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

...ucomiss LC0(%rip), %xmm0 cvttss2siq %xmm0, %rdx jb L3 subss LC0(%rip), %xmm0 movabsq $-9223372036854775808, %rax cvttss2siq %xmm0, %rdx xorq %rax, %rdx L3: movq %rdx, %rax ret instead of _conv: movss LCPI1_0(%rip), %xmm1 cvttss2siq %xmm0, %rcx movaps %xmm0, %xmm2 subss %xmm1, %xmm2 cvttss2siq %xmm2, %rax movabsq $-9223372036854775808, %rdx xorq %rdx, %rax ucomiss %xmm1, %xmm0 cmovb %rcx, %rax ret On 19 Apr 2017, at 2:10 PM, Michae...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

2

Vector trunc code generation difference between llvm-3.9 and 4.0

...he backend and > produces worse code even for x86 with AVX2: > before: > vmovd %edi, %xmm1 > vpmovzxwq %xmm1, %xmm1 > vpsraw %xmm1, %xmm0, %xmm0 > retq > > after: > vmovd %edi, %xmm1 > vpbroadcastd %xmm1, %ymm1 > vmovdqa LCPI1_0(%rip), %ymm2 > vpshufb %ymm2, %ymm1, %ymm1 > vpermq $232, %ymm1, %ymm1 > vpmovzxwd %xmm1, %ymm1 > vpmovsxwd %xmm0, %ymm0 > vpsravd %ymm1, %ymm0, %ymm0 > vpshufb %ymm2, %ymm0, %ymm0 > vpermq $232, %ymm0, %ymm0 > vzeroupper...

[LLVMdev] Help with PPC64 JIT

2012 Jul 20

3

[LLVMdev] Help with PPC64 JIT

...properly. Using the testcase '2003-01-04-ArgumentBug' > > the assembly generated for main functions is the following: > > > > .L.main: > > # BB#0: > > mflr 0 > > std 0, 16(1) > > stdu 1, -112(1) > > lis 3, .LCPI1_0 at ha > > li 4, 1 > > lfs 1, .LCPI1_0 at l(3) > > li 3, 0 > > bl foo > > nop > > addi 1, 1, 112 > > ld 0, 16(1) > > mtlr 0 > > blr > > > > Which is corr...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

2

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vector splat > instructions between llvm-3.9

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

2

Vector trunc code generation difference between llvm-3.9 and 4.0

...ore: >>> vmovd %edi, %xmm1 >>> vpmovzxwq %xmm1, %xmm1 >>> vpsraw %xmm1, %xmm0, %xmm0 >>> retq >>> >>> after: >>> vmovd %edi, %xmm1 >>> vpbroadcastd %xmm1, %ymm1 >>> vmovdqa LCPI1_0(%rip), %ymm2 >>> vpshufb %ymm2, %ymm1, %ymm1 >>> vpermq $232, %ymm1, %ymm1 >>> vpmovzxwd %xmm1, %ymm1 >>> vpmovsxwd %xmm0, %ymm0 >>> vpsravd %ymm1, %ymm0, %ymm0 >>> vpshufb %ymm2, %ymm0, %ymm0 >>&g...

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

2

[LLVMdev] Suboptimal code due to excessive spilling

...testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx movl 104(%esp), %edx xorl %esi, %esi .align 16, 0x90 .LBB1_2: # %.lr.ph.i # =>This Inner Loop Header: Depth=1 movsd (%edx,%ebx,8), %xmm2 addsd .LCPI1_0, %xmm2 movsd 16(%edx,%ebx,8), %xmm1 movsd %xmm1, (%esp) # 8-byte Spill movl %ebx, %edi addl $1, %edi addsd (%edx,%edi,8), %xmm2 movsd 136(%edx,%ebx,8), %xmm1 movsd %xmm1, 72(%esp) # 8-byte Spill movsd 128(%edx,%ebx,8), %xmm1 movsd %xmm1, 64(%esp) # 8-byte Spill m...

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

2012 Jan 24

4

[LLVMdev] Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

No one is arguing that there aren't ABI specs or LLVM design guidelines that say that unaligned accesses "should not", "could not" or "aren't guaranteed to" work, because it's besides the point. The point is that unaligned 32-bit loads and stores *work in practice* on every single ARM device Apple has ever manufactured. I'm not a hardware person, but

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

0

[LLVMdev] Suboptimal code due to excessive spilling

...testl %eax, %eax je .LBB1_3 # BB#1: xorl %ebx, %ebx movl 108(%esp), %ecx movl 104(%esp), %edx xorl %esi, %esi .align 16, 0x90 .LBB1_2: # %.lr.ph.i # =>This Inner Loop Header: Depth=1 movsd (%edx,%ebx,8), %xmm2 addsd .LCPI1_0, %xmm2 movsd 16(%edx,%ebx,8), %xmm1 movsd %xmm1, (%esp) # 8-byte Spill movl %ebx, %edi addl $1, %edi addsd (%edx,%edi,8), %xmm2 movsd 136(%edx,%ebx,8), %xmm1 movsd %xmm1, 72(%esp) # 8-byte Spill movsd 128(%edx,%ebx,8), %xmm1 movsd %xmm1, 64(%esp) # 8-byte Spill m...

search for: lcpi1_0