Displaying 20 results from an estimated 25 matches for "lbb0_5".
Did you mean:
lbb0_1
2019 Jun 30
6
[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
...M 9.0 :
.text
.file "main.c"
.globl hexagon2 // -- Begin function hexagon2
.p2align 2
.type hexagon2, at function
hexagon2: // @hexagon2
// %bb.0: // %entry.old
{
p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5
r2 = r0
allocframe(#0)
} // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70]
// fixup A - offset: 0, value: .LBB0_5, kind: fixup_Hexagon_B9_PCREL
// %bb.1: // %entry.old
{...
2015 Sep 30
2
Optimizing jumps to identical code blocks
...:
https://gist.github.com/ranma42/d2e6d50999e801ffd4ed
(based on two examples available in Rust issues:
https://github.com/rust-lang/rust/pull/24270#issuecomment-136681741
https://github.com/rust-lang/rust/issues/13623#issuecomment-136700526 )
In "enum4.s"
cmpl $1, %eax
je LBB0_5
cmpl $2, %eax
je LBB0_5
cmpl $3, %eax
LBB0_5:
could be removed.
(Further optimization would be possible by observing that the two 32-bit
comparison could be unified into a single 64-bit comparison, but I believe
this is a different issue)
In "enum6.s" all of the ele...
2019 Jul 01
0
[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
...align 2
.type hexagon2, at function
hexagon2: // @hexagon2
// %bb.0: // %entry.old
{
p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5
r2 = r0
allocframe(#0)
} // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70]
// fixup A...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...is
crc >>= 1; // rather poor!
}
return ~crc;
}
See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2)
crc32be: # @crc32be
xor eax, eax
test esi, esi
jne .LBB0_2
jmp .LBB0_5
.LBB0_4: # in Loop: Header=BB0_2 Depth=1
add rdi, 1
test esi, esi
je .LBB0_5
.LBB0_2: # =>This Loop Header: Depth=1
add esi, -1
movzx edx, byte ptr [rdi]
shl edx, 24
xor edx, eax
mov ecx, -8
mov eax,...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...hould I rather go to the code generator and try to add an optimization pass ?
Thanks for any feedback.
Ciao
Nat!
P.S. In case someone is interested, here is the assembler code and the IR that produced it.
Relevant LLVM generated x86_64 assembler portion with -Os
~~~
testq %r12, %r12
je LBB0_5
## BB#1:
movq -8(%r12), %rcx
movq (%rcx), %rax
movq -8(%rax), %rdx
andq %r15, %rdx
cmpq %r15, (%rax,%rdx)
je LBB0_2
## BB#3:
addq $8, %rcx
jmp LBB0_4
LBB0_2:
leaq 8(%rdx,%rax), %rcx
LBB0_4:
movq %r12, %rdi
movq %r15, %rsi
movq %r14, %rdx
callq *(%rcx)
movq %rax, %rbx
LBB0_5:
~~~
Bett...
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
...e -ggdb0 -O3 -S)**
UnpredictableBackedgeTakenCountFunc1():
xor eax, eax ; eax = 0
cmp eax, 4 ; cmpv = (eax == 4)
jne .LBB0_2 ; if(cmpv == false) goto LBB0_2
jmp .LBB0_4 ; goto LBB0_4
.LBB0_5:
xor ecx, ecx ; ecx = 0
cmp eax, 7 ; cmpv = (ecx == 7)
sete cl ; cl = cmpv
lea eax, [rax + rcx] ; eax = *(rax + rcx)
add eax, 1 ; eax++
cmp eax, 4...
2015 Sep 01
2
[RFC] New pass: LoopExitValues
...signed int
*Src, unsigned int Val) {
for (int Outer = 0; Outer < Size; ++Outer)
for (int Inner = 0; Inner < Size; ++Inner)
Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;
}
With LoopExitValues
-------------------------------
matrix_mul:
testl %edi, %edi
je .LBB0_5
xorl %r9d, %r9d
xorl %r8d, %r8d
.LBB0_2:
xorl %r11d, %r11d
.LBB0_3:
movl %r9d, %r10d
movl (%rdx,%r10,4), %eax
imull %ecx, %eax
movl %eax, (%rsi,%r10,4)
incl %r11d
incl %r9d
cmpl %r11d, %edi
jne .LBB0_3
incl %r8d
cmpl %edi, %r8d
jne .LBB0_2
.LB...
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM,
This is a proposal for a new pass that improves performance and code
size in some nested loop situations. The pass is target independent.
>From the description in the file header:
This optimization finds loop exit values reevaluated after the loop
execution and replaces them by the corresponding exit values if they
are available. Such sequences can arise after the
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...~crc;
>> }
>>
>> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm>
>> (-O2)
>>
>> crc32be: # @crc32be
>> xor eax, eax
>> test esi, esi
>> jne .LBB0_2
>> jmp .LBB0_5
>> .LBB0_4: # in Loop: Header=BB0_2 Depth=1
>> add rdi, 1
>> test esi, esi
>> je .LBB0_5
>> .LBB0_2: # =>This Loop Header: Depth=1
>> add esi, -1
>> movzx edx, byte ptr [rdi]
>> shl edx...
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...- -O3 -ffast-math -fslp-vectorize test.cpp
.text
.file "test.cpp"
.globl _Z3fooii
.align 2
.type _Z3fooii, at function
_Z3fooii: // @_Z3fooii
// BB#0: // %entry
cbz w0, .LBB0_5
// BB#1: // %for.body.lr.ph
mov w8, wzr
cmp w0, #0 // =0
cinc w9, w0, lt
asr w9, w9, #1
adrp x10, globalvar
.LBB0_2: // %for.body...
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote:
> Hello everyone,
>
> I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls".
> Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched.
Adding separate "s" instructions is
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...xmm1, %xmm2
movdqa %xmm0, %xmm3
movdqu -16(%rdi), %xmm0
movdqu (%rdi), %xmm1
paddd %xmm3, %xmm0
paddd %xmm2, %xmm1
addq $32, %rdi
addq $-8, %rdx
jne .LBB0_3
# BB#4:
movq %r8, %rdi
movq %rax, %rdx
jmp .LBB0_5
.LBB0_1:
pxor %xmm1, %xmm1
.LBB0_5: # %middle.block
paddd %xmm1, %xmm0
movdqa %xmm0, %xmm1
movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1]
paddd %xmm0, %xmm1
pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0]...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone,
I've added the "S" suffixed versions of ARM and Thumb2 instructions to
tablegen. Those are, for example, "movs" or "muls".
Of course, some instructions have already had their twins, such as add/adds,
and I leaved them untouched.
Besides, I propose the codegen optimization based on them, which removes the
redundant comparison in patterns like
orr
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...t;> Ciao
>> Nat!
>>
>>
>> P.S. In case someone is interested, here is the assembler code and the IR that produced it.
>>
>>
>>
>> Relevant LLVM generated x86_64 assembler portion with -Os
>> ~~~
>> testq %r12, %r12
>> je LBB0_5
>> ## BB#1:
>> movq -8(%r12), %rcx
>> movq (%rcx), %rax
>> movq -8(%rax), %rdx
>> andq %r15, %rdx
>> cmpq %r15, (%rax,%rdx)
>> je LBB0_2
>> ## BB#3:
>> addq $8, %rcx
>> jmp LBB0_4
>> LBB0_2:
>> leaq 8(%rdx,%rax), %rcx...
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...gt; .file "test.cpp"
>> .globl _Z3fooii
>> .align 2
>> .type _Z3fooii, at function
>> _Z3fooii: // @_Z3fooii
>> // BB#0: // %entry
>> cbz w0, .LBB0_5
>> // BB#1: // %for.body.lr.ph
>> mov w8, wzr
>> cmp w0, #0 // =0
>> cinc w9, w0, lt
>> asr w9, w9, #1
>> adrp x10, globalvar
>> .LBB0_2:...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
...- -O3 -ffast-math -fslp-vectorize test.cpp
.text
.file "test.cpp"
.globl _Z3fooii
.align 2
.type _Z3fooii, at function
_Z3fooii: // @_Z3fooii
// BB#0: // %entry
cbz w0, .LBB0_5
// BB#1: // %for.body.lr.ph
mov w8, wzr
cmp w0, #0 // =0
cinc w9, w0, lt
asr w9, w9, #1
adrp x10, globalvar
.LBB0_2: // %for.body
...
2013 Aug 06
1
[LLVMdev] Patching jump tables at run-time
....quad .LBB0_2
.quad .LBB0_3
.quad .LBB0_4
.quad .LBB0_5
Based on some run-time conditions, I may want to change the behavior of the switch instruction by swapping the jump table entries for 0 and 2 at runtime. My...
2018 Nov 28
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...gt; (-O1) and <
>> https://godbolt.org/z/zeExHm>
>> >> (-O2)
>> >>
>> >> crc32be: # @crc32be
>> >> xor eax, eax
>> >> test esi, esi
>> >> jne .LBB0_2
>> >> jmp .LBB0_5
>> >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1
>> >> add rdi, 1
>> >> test esi, esi
>> >> je .LBB0_5
>> >> .LBB0_2: # =>This Loop Header: Depth=1
>> >> add esi, -1
>> >>...
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello.
Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the
beginning of July 2016) I ran the following piece of C code:
void foo(long *A, long *B, long *C, long N) {
for (long i = 0; i < N; ++i) {
C[i] = A[i] + B[i];
}
}
The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
...-------- b.s generated from b.ll
----------------------------
~/workarea/llvm-r318298/dbuild/bin/opt -loop-rotate -S < b.ll
|~/workarea/llvm-r318298/dbuild/bin/llc
.cfi_startproc
# BB#0: # %entry
pushq %rax
.cfi_def_cfa_offset 16
movl $i, %eax
cmpq %rax, %rsi
ja .LBB0_5
# BB#1:
movl $i, %eax
.p2align 4, 0x90
.LBB0_3: # %while.body
# =>This Inner Loop Header: Depth=1
movq (%rdi), %rcx
movq %rcx, (%rsi)
movq 8(%rdi), %rcx
movq %rcx, (%rsi)
addq $6, %rsi
cmpq %rdx, %rsi
jae .LBB0_4
# BB#2:...