thr3ads.net - search: "lbb0

How can I tell llvm, that a branch is preferred ?

2015 Oct 27

4

How can I tell llvm, that a branch is preferred ?

...t;branch" or "switch". And __buildin_expect does nothing, that I am sure of. Unfortunately llvm has this knack for ordering my one most crucial part of code exactly the opposite I want to, it does: (x86_64) cmpq %r15, (%rax,%rdx) jne LBB0_3 Ltmp18: leaq 8(%rax,%rdx), %rcx jmp LBB0_4 LBB0_3: addq $8, %rcx LBB0_4: when I want, cmpq %r15, (%rax,%rdx) jeq LBB0_3 addq $8, %rcx jmp LBB0_4 LBB0_3: leaq 8(%rax,%rdx), %rcx LBB0_4: since that saves me executing a jump 99.9% of the time. Is there anything I can do ? Ciao Nat!

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

2

[LLVMdev] [Q] x86 peephole deficiency

Hi all, I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) and now I am running into a deficiency of the x86 peephole optimizer (or jump-threader?). Here is what I get: andl $3, %edi je .LBB0_4 # BB#2: # %nz # in Loop: Header=BB0_1 Depth=1 cmpl $2, %edi je .LBB0_6 # BB#3: # %nz.non-middle # in Loop: Header=BB0_1...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...ere is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Better/tighter assembler code would be (saves 2 instructions, one jump less) ~~~ testq %r12, %r12 je LBB0_5 movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%r...

[LLVMdev] Need a clue to improve the optimization of some C code

2015 Mar 03

2

[LLVMdev] Need a clue to improve the optimization of some C code

...e case, you might find this document useful: > http://llvm.org/docs/Frontend/PerformanceTips.html Maybe yes, if it would be sensible to rewrite the IR, which I am wondering, if that is a good/useful idea ? I don't know. I basically need to get LLVM to emit ~~~ cmpq %r15, (%rax,%rdx) jne LBB0_4 leaq 0(%rdx,%rax), %rcx LBB0_4: callq *8(%rcx) ~~~ instead of ~~~ cmpq %r15, (%rax,%rdx) je LBB0_2 addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: callq *(%rcx) ~~~ If I can do this by rewriting the IR, it would be nice, because it hopefully translates to other archite...

Nowaday Scalar Evolution's Problem.

2017 Nov 20

2

Nowaday Scalar Evolution's Problem.

...; goto %1; } **ASSEMBLY OUTPUT (clang.exe -ggdb0 -O3 -S)** UnpredictableBackedgeTakenCountFunc1(): xor eax, eax ; eax = 0 cmp eax, 4 ; cmpv = (eax == 4) jne .LBB0_2 ; if(cmpv == false) goto LBB0_2 jmp .LBB0_4 ; goto LBB0_4 .LBB0_5: xor ecx, ecx ; ecx = 0 cmp eax, 7 ; cmpv = (ecx == 7) sete cl ; cl = cmpv lea eax, [rax + rcx] ; eax = *(rax + rcx) add eax, 1...

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

0

[LLVMdev] [Q] x86 peephole deficiency

...Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz > # in Loop: Header=BB0_1 > Depth=1 > cmpl $2, %edi > je .LBB0_6 > # BB#3: # %nz.non-middle >...

A code layout related side-effect introduced by rL318299

2017 Dec 19

4

A code layout related side-effect introduced by rL318299

...8/dbuild/bin/llc .cfi_startproc # BB#0: # %entry pushq %rax .cfi_def_cfa_offset 16 movl $i, %eax .p2align 4, 0x90 .LBB0_1: # %while.cond # =>This Inner Loop Header: Depth=1 cmpq %rax, %rsi ja .LBB0_4 # BB#2: # %while.body # in Loop: Header=BB0_1 Depth=1 movq (%rdi), %rcx movq %rcx, (%rsi) movq 8(%rdi), %rcx movq %rcx, (%rsi) addq $6, %rdi addq $6, %rsi cmpq %rdx, %rsi jb .LBB0_1 # BB#3: # %...

A code layout related side-effect introduced by rL318299

2017 Dec 19

2

A code layout related side-effect introduced by rL318299

...>> pushq %rax >> .cfi_def_cfa_offset 16 >> movl $i, %eax >> .p2align 4, 0x90 >> .LBB0_1: # %while.cond >> # =>This Inner Loop Header: >> Depth=1 >> cmpq %rax, %rsi >> ja .LBB0_4 >> # BB#2: # %while.body >> # in Loop: Header=BB0_1 Depth=1 >> movq (%rdi), %rcx >> movq %rcx, (%rsi) >> movq 8(%rdi), %rcx >> movq %rcx, (%rsi) >> addq $6, %rdi >> addq $6, %rs...

[atomics][AArch64] Possible bug in cmpxchg lowering

2017 May 30

3

[atomics][AArch64] Possible bug in cmpxchg lowering

...ew _*release acquire*_ %v1 = extractvalue { i32, i1 } %v0, 1 ret i1 %v1 } to the equivalent of the following on AArch64: _*ldxr w8, [x0]*_ cmp w8, w1 b.ne .LBB0_3 // BB#1: // %cmpxchg.trystore stlxr w8, w2, [x0] cbz w8, .LBB0_4 // BB#2: // %cmpxchg.failure mov w0, wzr ret .LBB0_3: // %cmpxchg.nostore clrex mov w0, wzr ret .LBB0_4: orr w0, wzr, #0x1 ret GCC instead generates a ldaxr for the initial load, which seems...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx, -8 mov eax, edx .LBB...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

3

[LLVMdev] LICM promoting memory to scalar

...cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1 ldr w11, [x10, :lo12:globalvar] <===== load inside loop add w11, w11, w1 str w11, [x10, :lo12:globalvar] &lt...

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

0

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is

Instruction selection problem with type i64 - mistaken as v8i64?

2016 Jun 28

2

Instruction selection problem with type i64 - mistaken as v8i64?

...ody ], [ zeroinitializer, %vector.body.preheader ] The ASM code generated from it is the following: LBB0_3: // %vector.body.preheader REGVEC0 = 0 mov r0, 0 std -48(r10), r0 std -128(r10), REGVEC0 jmp LBB0_4 LBB0_4: // %vector.body ldd REGVEC0, -128(r10) ldd r0, -48(r10) I am surprised that the BPF scalar instructions ldd and std use vector register REGVEC0, which have type v8i64. For example, the TableGen definition of the LOAD inst...

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

2018 Mar 23

5

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

...State Consider baseline x86 instructions like the following, which test three conditions and if all pass, loads data from memory and potentially leaks it through some side channel: ``` # %bb.0: # %entry pushq %rax testl %edi, %edi jne .LBB0_4 # %bb.1: # %then1 testl %esi, %esi jne .LBB0_4 # %bb.2: # %then2 testl %edx, %edx je .LBB0_3 .LBB0_4: # %exit popq %rax retq .LBB0_3:...

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

2011 Feb 18

2

[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

2

[LLVMdev] LICM promoting memory to scalar

...w9, w0, lt >> asr w9, w9, #1 >> adrp x10, globalvar >> .LBB0_2: // %for.body >> // =>This Inner Loop Header: Depth=1 >> cmp w8, w9 >> b.hs .LBB0_4 >> // BB#3: // %if.then >> // in Loop: Header=BB0_2 Depth=1 >> ldr w11, [x10, :lo12:globalvar] <===== load inside loop >> add w11, w11, w1 >>...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...>> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx, byte ptr [rdi] >> shl edx, 24 >>...

[LLVMdev] GVNPRE /PRE is not effective

2013 Dec 13

0

[LLVMdev] GVNPRE /PRE is not effective

...movl $2147483647, %eax # imm = 0x7FFFFFFF addl phi, %eax cltd idivl %ecx movl %eax, sum movl (%edi,%esi,4), %ecx .LBB0_2: # %if.end leal (,%ecx,4), %eax cmpl $-14, %eax jl .LBB0_4 # BB#3: # %if.then5 movl $2147483647, %eax # imm = 0x7FFFFFFF addl phi, %eax cltd idivl %ecx movl %eax, sum .LBB0_4: # %if.end9 popl %esi popl %edi r...

[LLVMdev] missing blocks

2010 Oct 04

2

[LLVMdev] missing blocks

...s r0,r0,0 # BB#1: # %if.then call abort oris r0,r0,0 .LBB0_2: # %if.end addi %r12, %r0, 0 addi %r2, %r12, 0 call special_format oris r0,r0,0 subc r0, %r2, %r12 bne .LBB0_4 oris r0,r0,0 b .LBB0_3 oris r0,r0,0 # BB#3: # %if.then3 call abort oris r0,r0,0 .LBB0_4: # %if.end4 addi %r2, %r0, 0 call exit oris r0,r0,0 .Ltmp0: .size...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 03

3

[LLVMdev] LICM promoting memory to scalar

... cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1 ldr w11, [x10, :lo12:globalvar] <===== load inside loop add w11, w11, w1 str w11, [x10, :lo12:globalvar] ...

search for: lbb0_4