thr3ads.net - search: "lbb0

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jun 30

6

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

...M 9.0 : .text .file "main.c" .globl hexagon2 // -- Begin function hexagon2 .p2align 2 .type hexagon2, at function hexagon2: // @hexagon2 // %bb.0: // %entry.old { p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5 r2 = r0 allocframe(#0) } // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70] // fixup A - offset: 0, value: .LBB0_5, kind: fixup_Hexagon_B9_PCREL // %bb.1: // %entry.old {...

Optimizing jumps to identical code blocks

2015 Sep 30

2

Optimizing jumps to identical code blocks

...: https://gist.github.com/ranma42/d2e6d50999e801ffd4ed (based on two examples available in Rust issues: https://github.com/rust-lang/rust/pull/24270#issuecomment-136681741 https://github.com/rust-lang/rust/issues/13623#issuecomment-136700526 ) In "enum4.s" cmpl $1, %eax je LBB0_5 cmpl $2, %eax je LBB0_5 cmpl $3, %eax LBB0_5: could be removed. (Further optimization would be possible by observing that the two 32-bit comparison could be unified into a single 64-bit comparison, but I believe this is a different issue) In "enum6.s" all of the ele...

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

2019 Jul 01

0

[hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.

...align 2 .type hexagon2, at function hexagon2: // @hexagon2 // %bb.0: // %entry.old { p0 = cmp.gtu(r0,r1); if (p0.new) jump:nt .LBB0_5 r2 = r0 allocframe(#0) } // encoding: [A,0x41'A',A,0x15'A',0x00,0x3c,0x02,0x70] // fixup A...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...is crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx, -8 mov eax,...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...hould I rather go to the code generator and try to add an optimization pass ? Thanks for any feedback. Ciao Nat! P.S. In case someone is interested, here is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Bett...

Nowaday Scalar Evolution's Problem.

2017 Nov 20

2

Nowaday Scalar Evolution's Problem.

...e -ggdb0 -O3 -S)** UnpredictableBackedgeTakenCountFunc1(): xor eax, eax ; eax = 0 cmp eax, 4 ; cmpv = (eax == 4) jne .LBB0_2 ; if(cmpv == false) goto LBB0_2 jmp .LBB0_4 ; goto LBB0_4 .LBB0_5: xor ecx, ecx ; ecx = 0 cmp eax, 7 ; cmpv = (ecx == 7) sete cl ; cl = cmpv lea eax, [rax + rcx] ; eax = *(rax + rcx) add eax, 1 ; eax++ cmp eax, 4...

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

...signed int *Src, unsigned int Val) { for (int Outer = 0; Outer < Size; ++Outer) for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; } With LoopExitValues ------------------------------- matrix_mul: testl %edi, %edi je .LBB0_5 xorl %r9d, %r9d xorl %r8d, %r8d .LBB0_2: xorl %r11d, %r11d .LBB0_3: movl %r9d, %r10d movl (%rdx,%r10,4), %eax imull %ecx, %eax movl %eax, (%rsi,%r10,4) incl %r11d incl %r9d cmpl %r11d, %edi jne .LBB0_3 incl %r8d cmpl %edi, %r8d jne .LBB0_2 .LB...

[RFC] New pass: LoopExitValues

2015 Aug 31

2

[RFC] New pass: LoopExitValues

Hello LLVM, This is a proposal for a new pass that improves performance and code size in some nested loop situations. The pass is target independent. >From the description in the file header: This optimization finds loop exit values reevaluated after the loop execution and replaces them by the corresponding exit values if they are available. Such sequences can arise after the

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...~crc; >> } >> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx, byte ptr [rdi] >> shl edx...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

3

[LLVMdev] LICM promoting memory to scalar

...- -O3 -ffast-math -fslp-vectorize test.cpp .text .file "test.cpp" .globl _Z3fooii .align 2 .type _Z3fooii, at function _Z3fooii: // @_Z3fooii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body...

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

0

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...xmm1, %xmm2 movdqa %xmm0, %xmm3 movdqu -16(%rdi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm3, %xmm0 paddd %xmm2, %xmm1 addq $32, %rdi addq $-8, %rdx jne .LBB0_3 # BB#4: movq %r8, %rdi movq %rax, %rdx jmp .LBB0_5 .LBB0_1: pxor %xmm1, %xmm1 .LBB0_5: # %middle.block paddd %xmm1, %xmm0 movdqa %xmm0, %xmm1 movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] paddd %xmm0, %xmm1 pshufd $1, %xmm1, %xmm0 # xmm0 = xmm1[1,0,0,0]...

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

2

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...t;> Ciao >> Nat! >> >> >> P.S. In case someone is interested, here is the assembler code and the IR that produced it. >> >> >> >> Relevant LLVM generated x86_64 assembler portion with -Os >> ~~~ >> testq %r12, %r12 >> je LBB0_5 >> ## BB#1: >> movq -8(%r12), %rcx >> movq (%rcx), %rax >> movq -8(%rax), %rdx >> andq %r15, %rdx >> cmpq %r15, (%rax,%rdx) >> je LBB0_2 >> ## BB#3: >> addq $8, %rcx >> jmp LBB0_4 >> LBB0_2: >> leaq 8(%rdx,%rax), %rcx...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

2

[LLVMdev] LICM promoting memory to scalar

...gt; .file "test.cpp" >> .globl _Z3fooii >> .align 2 >> .type _Z3fooii, at function >> _Z3fooii: // @_Z3fooii >> // BB#0: // %entry >> cbz w0, .LBB0_5 >> // BB#1: // %for.body.lr.ph >> mov w8, wzr >> cmp w0, #0 // =0 >> cinc w9, w0, lt >> asr w9, w9, #1 >> adrp x10, globalvar >> .LBB0_2:...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 03

3

[LLVMdev] LICM promoting memory to scalar

...- -O3 -ffast-math -fslp-vectorize test.cpp .text .file "test.cpp" .globl _Z3fooii .align 2 .type _Z3fooii, at function _Z3fooii: // @_Z3fooii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body ...

[LLVMdev] Patching jump tables at run-time

2013 Aug 06

1

[LLVMdev] Patching jump tables at run-time

....quad .LBB0_2 .quad .LBB0_3 .quad .LBB0_4 .quad .LBB0_5 Based on some run-time conditions, I may want to change the behavior of the switch instruction by swapping the jump table entries for 0 and 2 at runtime. My...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 28

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...gt; (-O1) and < >> https://godbolt.org/z/zeExHm> >> >> (-O2) >> >> >> >> crc32be: # @crc32be >> >> xor eax, eax >> >> test esi, esi >> >> jne .LBB0_2 >> >> jmp .LBB0_5 >> >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> >> add rdi, 1 >> >> test esi, esi >> >> je .LBB0_5 >> >> .LBB0_2: # =>This Loop Header: Depth=1 >> >> add esi, -1 >> >>...

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

2

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

A code layout related side-effect introduced by rL318299

2017 Dec 19

4

A code layout related side-effect introduced by rL318299

...-------- b.s generated from b.ll ---------------------------- ~/workarea/llvm-r318298/dbuild/bin/opt -loop-rotate -S < b.ll |~/workarea/llvm-r318298/dbuild/bin/llc .cfi_startproc # BB#0: # %entry pushq %rax .cfi_def_cfa_offset 16 movl $i, %eax cmpq %rax, %rsi ja .LBB0_5 # BB#1: movl $i, %eax .p2align 4, 0x90 .LBB0_3: # %while.body # =>This Inner Loop Header: Depth=1 movq (%rdi), %rcx movq %rcx, (%rsi) movq 8(%rdi), %rcx movq %rcx, (%rsi) addq $6, %rsi cmpq %rdx, %rsi jae .LBB0_4 # BB#2:...

search for: lbb0_5