Displaying 20 results from an estimated 45 matches for "lbb1_1".
Did you mean:
lbb0_1
2009 Mar 03
3
[LLVMdev] Tight overlapping loops and performance
...case.
The crux of the example still seems intact. From LLVM SVN, converted to asm via llc:
.text
.align 4,0x90
.globl _main
_main:
subl $12, %esp
movl $1999, %eax
xorl %ecx, %ecx
movl $1999, %edx
.align 4,0x90
LBB1_1: ## loopto
cmpl $1, %eax
leal -1(%eax), %eax
cmove %edx, %eax
incl %ecx
cmpl $999999999, %ecx
jne LBB1_1 ## loopto
LBB1_2: ## bb1
movl %eax, 4(%esp)
movl $LC, (%esp)
call _printf
xorl %eax...
2012 Apr 25
3
[LLVMdev] Not enough optimisations in the SelectionDAG phase?
...3, $3, 1032
lw $3, 0($3)
bltz $3, $BB0_1
nop
# BB#2:
The two operation lui and ori which are used to calculate memory address
actually are loop invariants. They supposed to be moved out of the loop. I
thought it might be a limitation of the MIPS backend. Then I tried the ARM
backend,
.LBB1_1:
ldr r2, .LCPI1_2
ldr r2, [r2]
cmp r2, #0
blt .LBB1_1
@ BB#2:
The first ldr instruction is to load the address from constant pool. It
also should be outside the loop.
I'm not sure if this is because of the optimisations are not enough in the
common SelectionDAG optimisation phase, or...
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...t and double instructions? Is this a bug, or am I doing something wrong?
Tyler
float dotproduct(float *A, float *B, int n) {
float sum = 0;
for(int i = 0; i < n; ++i) {
sum += A[i] * B[i];
}
return sum;
}
clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
<loop body>
.LBB1_1:
movss (%rdi), %xmm1
addq $4, %rdi
mulss (%rsi), %xmm1
addq $4, %rsi
decl %edx
addss %xmm1, %xmm0
jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermai...
2018 Nov 26
3
BUGS in code generated for target i386-win32
...lfsr & 0x80000000 ? 0x04C11DB7 ^ (lfsr << 1) : lfsr << 1;
#else
lfsr = lfsr32(lfsr, 0x04C11DB7);
#endif
} while (lfsr != 123456789);
return period;
}
--- EOF ---
Compiled with -O2 -target i386-win32 this yields the following code:
_main: # @main
xor edx, edx
LBB1_1: # =>This Inner Loop Header: Depth=1
add ecx, ecx
sbb eax, eax
and eax, edx
xor eax, ecx
inc edx
cmp eax, 123456789
jne LBB1_1
mov eax, edx
ret
BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
~~~~~~~
It fails to load the resu...
2013 Jan 23
2
[LLVMdev] introducing sign extending halfword loads into the LLVM IR
...gt; I used opt -O3 and llc -O3 -march=arm -regalloc=greedy, and here is the code that is generated for the loop body (and two instructions that set a loop-invariant mask beforehand), with some comments of mine:
>
> mov r12, #255
> orr r12, r12, #65280
> LBB1_1:
> ldrsh r3, [r1] # loads a short that is sign-extended to 32 bits
> mov r4, lr
> cmp r3, #2048
> bge .LBB1_3
> and r4, r3, r12 # mask with 0xffff to convert to short again
> lsl r4...
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...> Tyler
>
> float dotproduct(float *A, float *B, int n) {
> float sum = 0;
> for(int i = 0; i < n; ++i) {
> sum += A[i] * B[i];
> }
> return sum;
> }
>
> clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
>
> <loop body>
> .LBB1_1:
> movss (%rdi), %xmm1
> addq $4, %rdi
> mulss (%rsi), %xmm1
> addq $4, %rsi
> decl %edx
> addss %xmm1, %xmm0
> jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL...
2009 Mar 02
0
[LLVMdev] Tight overlapping loops and performance
On Mon, Mar 2, 2009 at 2:45 PM, Jonathan Turner <probata at hotmail.com> wrote:
> For which version of gcc? I should mention I'm on OS X and using the LLVM
> SVN.
gcc 4.3. It's also possible this is processor-sensitive.
>> First, try looking at the generated code... the code LLVM generates is
>> probably not what you're expecting. I'm getting the
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...t and double instructions? Is this a bug, or am I doing something wrong?
Tyler
float dotproduct(float *A, float *B, int n) {
float sum = 0;
for(int i = 0; i < n; ++i) {
sum += A[i] * B[i];
}
return sum;
}
clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
<loop body>
.LBB1_1:
movss (%rdi), %xmm1
addq $4, %rdi
mulss (%rsi), %xmm1
addq $4, %rsi
decl %edx
addss %xmm1, %xmm0
jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/piperma...
2012 Apr 29
0
[LLVMdev] Not enough optimisations in the SelectionDAG phase?
...$3, $BB0_1
> nop
> # BB#2:
>
>
> The two operation lui and ori which are used to calculate memory address actually are loop invariants. They supposed to be moved out of the loop. I thought it might be a limitation of the MIPS backend. Then I tried the ARM backend,
>
> .LBB1_1:
> ldr r2, .LCPI1_2
> ldr r2, [r2]
> cmp r2, #0
> blt .LBB1_1
> @ BB#2:
>
> The first ldr instruction is to load the address from constant pool. It also should be outside the loop.
>
> I'm not sure if this is because of the optimisations...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...FFFFFF if CF set, else 0
mov eax, edx | and edx, -306674912
cmovns eax, r8d | xor eax, edx
add ecx, 1
jne .LBB0_3
jmp .LBB0_4
.LBB0_5:
ret
crc32le: # @crc32le
test esi, esi
je .LBB1_1
mov eax, -1
.LBB1_4: # =>This Loop Header: Depth=1
add esi, -1
movzx ecx, byte ptr [rdi]
xor eax, ecx
mov r8d, -8
.LBB1_5: # Parent Loop BB1_4 Depth=1 | # 4 instructions instead of 7, and
mov edx, eax | # neither r...
2009 Mar 02
3
[LLVMdev] Tight overlapping loops and performance
> Date: Mon, 2 Mar 2009 13:41:45 -0800
> From: eli.friedman at gmail.com
> To: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Tight overlapping loops and performance
>
> Hmm, on my computer, I get around 2.5 seconds with both gcc -O3 and
> llvm-gcc -O3 (using llvm-gcc from svn). Not sure what you're doing
> differently; I wouldn't be surprised if it's
2017 Jul 17
2
A bug related with undef value when bootstrap MemorySSA.cpp
...- Begin function hoo
74 .p2align 4, 0x90
75 .type hoo, at function
76 hoo: # @hoo
77 .cfi_startproc
78 # BB#0:
79 movq a(%rip), %rax
80 movq cnt(%rip), %rcx
81 cmpq $0, i_hasval(%rip)
82 sete %sil
83 xorl %edx, %edx
84 .p2align 4, 0x90
85 .LBB1_1: # =>This Inner Loop Header:
Depth=1
86 testb $1, %sil
87 je .LBB1_3
88 # BB#2: # in Loop: Header=BB1_1 Depth=1
89 movq b(%rip), %rsi
90 addq %rax, %rsi
91 movq %rsi, c(%rip)
92 movq $3, i_hasval(%rip)
93 incq %...
2012 Apr 29
1
[LLVMdev] Not enough optimisations in the SelectionDAG phase?
...>> # BB#2:
>>
>>
>> The two operation lui and ori which are used to calculate memory address actually are loop invariants. They supposed to be moved out of the loop. I thought it might be a limitation of the MIPS backend. Then I tried the ARM backend,
>>
>> .LBB1_1:
>> ldr r2, .LCPI1_2
>> ldr r2, [r2]
>> cmp r2, #0
>> blt .LBB1_1
>> @ BB#2:
>>
>> The first ldr instruction is to load the address from constant pool. It also should be outside the loop.
>>
>> I'm not sure if this is because of...
2017 Jul 17
3
A bug related with undef value when bootstrap MemorySSA.cpp
...# @hoo
>> 77 .cfi_startproc
>> 78 # BB#0:
>> 79 movq a(%rip), %rax
>> 80 movq cnt(%rip), %rcx
>> 81 cmpq $0, i_hasval(%rip)
>> 82 sete %sil
>> 83 xorl %edx, %edx
>> 84 .p2align 4, 0x90
>> 85 .LBB1_1: # =>This Inner Loop Header:
>> Depth=1
>> 86 testb $1, %sil
>> 87 je .LBB1_3
>> 88 # BB#2: # in Loop: Header=BB1_1
>> Depth=1
>> 89 movq b(%rip), %rsi
>> 90 addq %rax, %rsi
>...
2017 Jul 17
3
A bug related with undef value when bootstrap MemorySSA.cpp
...t;> >> 78 # BB#0:
>> >> 79 movq a(%rip), %rax
>> >> 80 movq cnt(%rip), %rcx
>> >> 81 cmpq $0, i_hasval(%rip)
>> >> 82 sete %sil
>> >> 83 xorl %edx, %edx
>> >> 84 .p2align 4, 0x90
>> >> 85 .LBB1_1: # =>This Inner Loop Header:
>> >> Depth=1
>> >> 86 testb $1, %sil
>> >> 87 je .LBB1_3
>> >> 88 # BB#2: # in Loop: Header=BB1_1
>> >> Depth=1
>> >> 89 m...
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...-306674912
>> cmovns eax, r8d | xor eax, edx
>> add ecx, 1
>> jne .LBB0_3
>> jmp .LBB0_4
>> .LBB0_5:
>> ret
>> crc32le: # @crc32le
>> test esi, esi
>> je .LBB1_1
>> mov eax, -1
>> .LBB1_4: # =>This Loop Header: Depth=1
>> add esi, -1
>> movzx ecx, byte ptr [rdi]
>> xor eax, ecx
>> mov r8d, -8
>> .LBB1_5: # Parent Loop BB1_4 Depth=1 | # 4 instructions instead...
2013 Jan 24
0
[LLVMdev] introducing sign extending halfword loads into the LLVM IR
...-O3 and llc -O3 -march=arm -regalloc=greedy, and here is the code that is generated for the loop body (and two instructions that set a loop-invariant mask beforehand), with some comments of mine:
>>
>> mov r12, #255
>> orr r12, r12, #65280
>> LBB1_1:
>> ldrsh r3, [r1] # loads a short that is sign-extended to 32 bits
>> mov r4, lr
>> cmp r3, #2048
>> bge .LBB1_3
>> and r4, r3, r12 # mask with 0xffff to convert to short again
>>...
2017 Jul 18
4
A bug related with undef value when bootstrap MemorySSA.cpp
...movq a(%rip), %rax
>>>> >> 80 movq cnt(%rip), %rcx
>>>> >> 81 cmpq $0, i_hasval(%rip)
>>>> >> 82 sete %sil
>>>> >> 83 xorl %edx, %edx
>>>> >> 84 .p2align 4, 0x90
>>>> >> 85 .LBB1_1: # =>This Inner Loop Header:
>>>> >> Depth=1
>>>> >> 86 testb $1, %sil
>>>> >> 87 je .LBB1_3
>>>> >> 88 # BB#2: # in Loop: Header=BB1_1
>>>> &...
2009 Mar 02
0
[LLVMdev] Tight overlapping loops and performance
...LVM.
> Should I be looking at any particular optimization passes that aren't in
> -std-compile-opts to match the gcc speeds?
First, try looking at the generated code... the code LLVM generates is
probably not what you're expecting. I'm getting the following for the
main loop:
.LBB1_1: # loopto
cmpl $1, %eax
leal -1(%eax), %eax
cmove %edx, %eax
incl %ecx
cmpl $999999999, %ecx
jne .LBB1_1 # loopto
LLVM is optimizing your oddly nested loops into a single loop which
does some extra computation to keep track of the timeout variable.
Since you'd normally be doing something...
2014 Feb 19
2
[LLVMdev] better code for IV
...decq %rcx
jne .LBB0_1
# BB#2:
Ret
This is what I want to get:
ArrayAdd2: # @ArrayAdd2
.cfi_startproc
# BB#0: # %Entry
xorl %eax, %eax
.align 16, 0x90
.LBB1_1: # %L_entry
# =>This Inner Loop Header: Depth=1
movslq %eax, %r8
movss (%rdi,%r8,4), %xmm0
addss (%rsi,%r8,4), %xmm0
movss %xmm0, (%rdx,%r8,4)
incq %ra...