thr3ads.net - search: "lbb1

[LLVMdev] Tight overlapping loops and performance

2009 Mar 03

3

[LLVMdev] Tight overlapping loops and performance

...case. The crux of the example still seems intact. From LLVM SVN, converted to asm via llc: .text .align 4,0x90 .globl _main _main: subl $12, %esp movl $1999, %eax xorl %ecx, %ecx movl $1999, %edx .align 4,0x90 LBB1_1: ## loopto cmpl $1, %eax leal -1(%eax), %eax cmove %edx, %eax incl %ecx cmpl $999999999, %ecx jne LBB1_1 ## loopto LBB1_2: ## bb1 movl %eax, 4(%esp) movl $LC, (%esp) call _printf xorl %eax...

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 25

3

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

...3, $3, 1032 lw $3, 0($3) bltz $3, $BB0_1 nop # BB#2: The two operation lui and ori which are used to calculate memory address actually are loop invariants. They supposed to be moved out of the loop. I thought it might be a limitation of the MIPS backend. Then I tried the ARM backend, .LBB1_1: ldr r2, .LCPI1_2 ldr r2, [r2] cmp r2, #0 blt .LBB1_1 @ BB#2: The first ldr instruction is to load the address from constant pool. It also should be outside the loop. I'm not sure if this is because of the optimisations are not enough in the common SelectionDAG optimisation phase, or...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

2

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...t and double instructions? Is this a bug, or am I doing something wrong? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermai...

BUGS in code generated for target i386-win32

2018 Nov 26

3

BUGS in code generated for target i386-win32

...lfsr & 0x80000000 ? 0x04C11DB7 ^ (lfsr << 1) : lfsr << 1; #else lfsr = lfsr32(lfsr, 0x04C11DB7); #endif } while (lfsr != 123456789); return period; } --- EOF --- Compiled with -O2 -target i386-win32 this yields the following code: _main: # @main xor edx, edx LBB1_1: # =>This Inner Loop Header: Depth=1 add ecx, ecx sbb eax, eax and eax, edx xor eax, ecx inc edx cmp eax, 123456789 jne LBB1_1 mov eax, edx ret BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"! ~~~~~~~ It fails to load the resu...

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

2013 Jan 23

2

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

...gt; I used opt -O3 and llc -O3 -march=arm -regalloc=greedy, and here is the code that is generated for the loop body (and two instructions that set a loop-invariant mask beforehand), with some comments of mine: > > mov r12, #255 > orr r12, r12, #65280 > LBB1_1: > ldrsh r3, [r1] # loads a short that is sign-extended to 32 bits > mov r4, lr > cmp r3, #2048 > bge .LBB1_3 > and r4, r3, r12 # mask with 0xffff to convert to short again > lsl r4...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

0

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...> Tyler > > float dotproduct(float *A, float *B, int n) { > float sum = 0; > for(int i = 0; i < n; ++i) { > sum += A[i] * B[i]; > } > return sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL...

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

0

[LLVMdev] Tight overlapping loops and performance

On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote: > For which version of gcc? I should mention I'm on OS X and using the LLVM > SVN. gcc 4.3. It's also possible this is processor-sensitive. >> First, try looking at the generated code... the code LLVM generates is >> probably not what you're expecting. I'm getting the

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

1

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...t and double instructions? Is this a bug, or am I doing something wrong? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/piperma...

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

0

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

...$3, $BB0_1 > nop > # BB#2: > > > The two operation lui and ori which are used to calculate memory address actually are loop invariants. They supposed to be moved out of the loop. I thought it might be a limitation of the MIPS backend. Then I tried the ARM backend, > > .LBB1_1: > ldr r2, .LCPI1_2 > ldr r2, [r2] > cmp r2, #0 > blt .LBB1_1 > @ BB#2: > > The first ldr instruction is to load the address from constant pool. It also should be outside the loop. > > I'm not sure if this is because of the optimisations...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...FFFFFF if CF set, else 0 mov eax, edx | and edx, -306674912 cmovns eax, r8d | xor eax, edx add ecx, 1 jne .LBB0_3 jmp .LBB0_4 .LBB0_5: ret crc32le: # @crc32le test esi, esi je .LBB1_1 mov eax, -1 .LBB1_4: # =>This Loop Header: Depth=1 add esi, -1 movzx ecx, byte ptr [rdi] xor eax, ecx mov r8d, -8 .LBB1_5: # Parent Loop BB1_4 Depth=1 | # 4 instructions instead of 7, and mov edx, eax | # neither r...

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

3

[LLVMdev] Tight overlapping loops and performance

> Date: Mon, 2 Mar 2009 13:41:45 -0800 > From: eli.friedman at gmail.com > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Tight overlapping loops and performance > > Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and > llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing > differently; I wouldn't be surprised if it's

2017 Jul 17

2

A bug related with undef value when bootstrap MemorySSA.cpp

...- Begin function hoo 74 .p2align 4, 0x90 75 .type hoo, at function 76 hoo: # @hoo 77 .cfi_startproc 78 # BB#0: 79 movq a(%rip), %rax 80 movq cnt(%rip), %rcx 81 cmpq $0, i_hasval(%rip) 82 sete %sil 83 xorl %edx, %edx 84 .p2align 4, 0x90 85 .LBB1_1: # =>This Inner Loop Header: Depth=1 86 testb $1, %sil 87 je .LBB1_3 88 # BB#2: # in Loop: Header=BB1_1 Depth=1 89 movq b(%rip), %rsi 90 addq %rax, %rsi 91 movq %rsi, c(%rip) 92 movq $3, i_hasval(%rip) 93 incq %...

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

1

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

...>> # BB#2: >> >> >> The two operation lui and ori which are used to calculate memory address actually are loop invariants. They supposed to be moved out of the loop. I thought it might be a limitation of the MIPS backend. Then I tried the ARM backend, >> >> .LBB1_1: >> ldr r2, .LCPI1_2 >> ldr r2, [r2] >> cmp r2, #0 >> blt .LBB1_1 >> @ BB#2: >> >> The first ldr instruction is to load the address from constant pool. It also should be outside the loop. >> >> I'm not sure if this is because of...

2017 Jul 17

3

A bug related with undef value when bootstrap MemorySSA.cpp

...# @hoo >> 77 .cfi_startproc >> 78 # BB#0: >> 79 movq a(%rip), %rax >> 80 movq cnt(%rip), %rcx >> 81 cmpq $0, i_hasval(%rip) >> 82 sete %sil >> 83 xorl %edx, %edx >> 84 .p2align 4, 0x90 >> 85 .LBB1_1: # =>This Inner Loop Header: >> Depth=1 >> 86 testb $1, %sil >> 87 je .LBB1_3 >> 88 # BB#2: # in Loop: Header=BB1_1 >> Depth=1 >> 89 movq b(%rip), %rsi >> 90 addq %rax, %rsi >...

2017 Jul 17

3

A bug related with undef value when bootstrap MemorySSA.cpp

...t;> >> 78 # BB#0: >> >> 79 movq a(%rip), %rax >> >> 80 movq cnt(%rip), %rcx >> >> 81 cmpq $0, i_hasval(%rip) >> >> 82 sete %sil >> >> 83 xorl %edx, %edx >> >> 84 .p2align 4, 0x90 >> >> 85 .LBB1_1: # =>This Inner Loop Header: >> >> Depth=1 >> >> 86 testb $1, %sil >> >> 87 je .LBB1_3 >> >> 88 # BB#2: # in Loop: Header=BB1_1 >> >> Depth=1 >> >> 89 m...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...-306674912 >> cmovns eax, r8d | xor eax, edx >> add ecx, 1 >> jne .LBB0_3 >> jmp .LBB0_4 >> .LBB0_5: >> ret >> crc32le: # @crc32le >> test esi, esi >> je .LBB1_1 >> mov eax, -1 >> .LBB1_4: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx ecx, byte ptr [rdi] >> xor eax, ecx >> mov r8d, -8 >> .LBB1_5: # Parent Loop BB1_4 Depth=1 | # 4 instructions instead...

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

2013 Jan 24

0

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

...-O3 and llc -O3 -march=arm -regalloc=greedy, and here is the code that is generated for the loop body (and two instructions that set a loop-invariant mask beforehand), with some comments of mine: >> >> mov r12, #255 >> orr r12, r12, #65280 >> LBB1_1: >> ldrsh r3, [r1] # loads a short that is sign-extended to 32 bits >> mov r4, lr >> cmp r3, #2048 >> bge .LBB1_3 >> and r4, r3, r12 # mask with 0xffff to convert to short again >>...

2017 Jul 18

4

A bug related with undef value when bootstrap MemorySSA.cpp

...movq a(%rip), %rax >>>> >> 80 movq cnt(%rip), %rcx >>>> >> 81 cmpq $0, i_hasval(%rip) >>>> >> 82 sete %sil >>>> >> 83 xorl %edx, %edx >>>> >> 84 .p2align 4, 0x90 >>>> >> 85 .LBB1_1: # =>This Inner Loop Header: >>>> >> Depth=1 >>>> >> 86 testb $1, %sil >>>> >> 87 je .LBB1_3 >>>> >> 88 # BB#2: # in Loop: Header=BB1_1 >>>> &...

[LLVMdev] Tight overlapping loops and performance

2009 Mar 02

0

[LLVMdev] Tight overlapping loops and performance

...LVM. > Should I be looking at any particular optimization passes that aren't in > -std-compile-opts to match the gcc speeds? First, try looking at the generated code... the code LLVM generates is probably not what you're expecting. I'm getting the following for the main loop: .LBB1_1: # loopto cmpl $1, %eax leal -1(%eax), %eax cmove %edx, %eax incl %ecx cmpl $999999999, %ecx jne .LBB1_1 # loopto LLVM is optimizing your oddly nested loops into a single loop which does some extra computation to keep track of the timeout variable. Since you'd normally be doing something...

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

...decq %rcx jne .LBB0_1 # BB#2: Ret This is what I want to get: ArrayAdd2: # @ArrayAdd2 .cfi_startproc # BB#0: # %Entry xorl %eax, %eax .align 16, 0x90 .LBB1_1: # %L_entry # =>This Inner Loop Header: Depth=1 movslq %eax, %r8 movss (%rdi,%r8,4), %xmm0 addss (%rsi,%r8,4), %xmm0 movss %xmm0, (%rdx,%r8,4) incq %ra...

search for: lbb1_1