search for: lbb0_5

Displaying 20 results from an estimated 25 matches for "lbb0_5".

Did you mean: lbb0_1
2019 Jun 30
6
[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
...M 9.0 : .text .file "main.c" .globl hexagon2 // -- Begin function hexagon2 .p2align 2 .type hexagon2, at function hexagon2: // @hexagon2 // %bb.0: // %entry.old { p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5 r2 = r0 allocframe(#0) } // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70] // fixup A - offset: 0, value: .LBB0_5, kind: fixup_Hexagon_B9_PCREL // %bb.1: // %entry.old {...
2015 Sep 30
2
Optimizing jumps to identical code blocks
...: https://gist.github.com/ranma42/d2e6d50999e801ffd4ed (based on two examples available in Rust issues: https://github.com/rust-lang/rust/pull/24270#issuecomment-136681741 https://github.com/rust-lang/rust/issues/13623#issuecomment-136700526 ) In "enum4.s" cmpl $1, %eax je LBB0_5 cmpl $2, %eax je LBB0_5 cmpl $3, %eax LBB0_5: could be removed. (Further optimization would be possible by observing that the two 32-bit comparison could be unified into a single 64-bit comparison, but I believe this is a different issue) In "enum6.s" all of the ele...
2019 Jul 01
0
[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
...align 2 .type hexagon2, at function hexagon2: // @hexagon2 // %bb.0: // %entry.old { p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5 r2 = r0 allocframe(#0) } // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70] // fixup A...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...is crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx, -8 mov eax,...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...hould I rather go to the code generator and try to add an optimization pass ? Thanks for any feedback. Ciao Nat! P.S. In case someone is interested, here is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Bett...
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
...e -ggdb0 -O3 -S)** UnpredictableBackedgeTakenCountFunc1(): xor eax, eax ; eax = 0 cmp eax, 4 ; cmpv = (eax == 4) jne .LBB0_2 ; if(cmpv == false) goto LBB0_2 jmp .LBB0_4 ; goto LBB0_4 .LBB0_5: xor ecx, ecx ; ecx = 0 cmp eax, 7 ; cmpv = (ecx == 7) sete cl ; cl = cmpv lea eax, [rax + rcx] ; eax = *(rax + rcx) add eax, 1 ; eax++ cmp eax, 4...
2015 Sep 01
2
[RFC] New pass: LoopExitValues
...signed int *Src, unsigned int Val) { for (int Outer = 0; Outer < Size; ++Outer) for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; } With LoopExitValues ------------------------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LB...
2015 Aug 31
2
[RFC] New pass: LoopExitValues
Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...~crc; >> } >> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx, byte ptr [rdi] >> shl edx...
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...- -O3 -ffast-math -fslp-vectorize test.cpp .text .file "test.cpp" .globl _Z3fooii .align 2 .type _Z3fooii, at function _Z3fooii: // @_Z3fooii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body...
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...xmm1, %xmm2 movdqa %xmm0, %xmm3 movdqu -16(%rdi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm3, %xmm0 paddd %xmm2, %xmm1 addq $32, %rdi addq $-8, %rdx jne .LBB0_3 # BB#4: movq %r8, %rdi movq %rax, %rdx jmp .LBB0_5 .LBB0_1: pxor %xmm1, %xmm1 .LBB0_5: # %middle.block paddd %xmm1, %xmm0 movdqa %xmm0, %xmm1 movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] paddd %xmm0, %xmm1 pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0]...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...t;> Ciao >> Nat! >> >> >> P.S. In case someone is interested, here is the assembler code and the IR that produced it. >> >> >> >> Relevant LLVM generated x86_64 assembler portion with -Os >> ~~~ >> testq %r12, %r12 >> je LBB0_5 >> ## BB#1: >> movq -8(%r12), %rcx >> movq (%rcx), %rax >> movq -8(%rax), %rdx >> andq %r15, %rdx >> cmpq %r15, (%rax,%rdx) >> je LBB0_2 >> ## BB#3: >> addq $8, %rcx >> jmp LBB0_4 >> LBB0_2: >> leaq 8(%rdx,%rax), %rcx...
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...gt; .file "test.cpp" >> .globl _Z3fooii >> .align 2 >> .type _Z3fooii, at function >> _Z3fooii: // @_Z3fooii >> // BB#0: // %entry >> cbz w0, .LBB0_5 >> // BB#1: // %for.body.lr.ph >> mov w8, wzr >> cmp w0, #0 // =0 >> cinc w9, w0, lt >> asr w9, w9, #1 >> adrp x10, globalvar >> .LBB0_2:...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
...- -O3 -ffast-math -fslp-vectorize test.cpp         .text         .file   "test.cpp"         .globl  _Z3fooii         .align  2         .type   _Z3fooii, at function _Z3fooii:                               // @_Z3fooii // BB#0:                                // %entry         cbz     w0, .LBB0_5 // BB#1:                                // %for.body.lr.ph         mov      w8, wzr         cmp      w0, #0                 // =0         cinc     w9, w0, lt         asr     w9, w9, #1         adrp    x10, globalvar .LBB0_2:                                // %for.body                              ...
2013 Aug 06
1
[LLVMdev] Patching jump tables at run-time
....quad .LBB0_2 .quad .LBB0_3 .quad .LBB0_4 .quad .LBB0_5 Based on some run-time conditions, I may want to change the behavior of the switch instruction by swapping the jump table entries for 0 and 2 at runtime. My...
2018 Nov 28
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...gt; (-O1) and < >> https://godbolt.org/z/zeExHm> >> >> (-O2) >> >> >> >> crc32be: # @crc32be >> >> xor eax, eax >> >> test esi, esi >> >> jne .LBB0_2 >> >> jmp .LBB0_5 >> >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> >> add rdi, 1 >> >> test esi, esi >> >> je .LBB0_5 >> >> .LBB0_2: # =>This Loop Header: Depth=1 >> >> add esi, -1 >> >>...
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
...-------- b.s generated from b.ll ---------------------------- ~/workarea/llvm-r318298/dbuild/bin/opt -loop-rotate -S < b.ll |~/workarea/llvm-r318298/dbuild/bin/llc .cfi_startproc # BB#0: # %entry pushq %rax .cfi_def_cfa_offset 16 movl $i, %eax cmpq %rax, %rsi ja .LBB0_5 # BB#1: movl $i, %eax .p2align 4, 0x90 .LBB0_3: # %while.body # =>This Inner Loop Header: Depth=1 movq (%rdi), %rcx movq %rcx, (%rsi) movq 8(%rdi), %rcx movq %rcx, (%rsi) addq $6, %rsi cmpq %rdx, %rsi jae .LBB0_4 # BB#2:...