search for: lbb0_4

Displaying 20 results from an estimated 36 matches for "lbb0_4".

Did you mean: lbb0_1
2015 Oct 27
4
How can I tell llvm, that a branch is preferred ?
...t;branch" or "switch". And __buildin_expect does nothing, that I am sure of. Unfortunately llvm has this knack for ordering my one most crucial part of code exactly the opposite I want to, it does: (x86_64) cmpq %r15, (%rax,%rdx) jne LBB0_3 Ltmp18: leaq 8(%rax,%rdx), %rcx jmp LBB0_4 LBB0_3: addq $8, %rcx LBB0_4: when I want, cmpq %r15, (%rax,%rdx) jeq LBB0_3 addq $8, %rcx jmp LBB0_4 LBB0_3: leaq 8(%rax,%rdx), %rcx LBB0_4: since that saves me executing a jump 99.9% of the time. Is there anything I can do ? Ciao Nat!
2010 Oct 07
2
[LLVMdev] [Q] x86 peephole deficiency
Hi all, I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) and now I am running into a deficiency of the x86 peephole optimizer (or jump-threader?). Here is what I get: andl $3, %edi je .LBB0_4 # BB#2: # %nz # in Loop: Header=BB0_1 Depth=1 cmpl $2, %edi je .LBB0_6 # BB#3: # %nz.non-middle # in Loop: Header=BB0_1...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...ere is the assembler code and the IR that produced it. Relevant LLVM generated x86_64 assembler portion with -Os ~~~ testq %r12, %r12 je LBB0_5 ## BB#1: movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%rax), %rdx andq %r15, %rdx cmpq %r15, (%rax,%rdx) je LBB0_2 ## BB#3: addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx callq *(%rcx) movq %rax, %rbx LBB0_5: ~~~ Better/tighter assembler code would be (saves 2 instructions, one jump less) ~~~ testq %r12, %r12 je LBB0_5 movq -8(%r12), %rcx movq (%rcx), %rax movq -8(%r...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...e case, you might find this document useful: > http://llvm.org/docs/Frontend/PerformanceTips.html Maybe yes, if it would be sensible to rewrite the IR, which I am wondering, if that is a good/useful idea ? I don't know. I basically need to get LLVM to emit ~~~ cmpq %r15, (%rax,%rdx) jne LBB0_4 leaq 0(%rdx,%rax), %rcx LBB0_4: callq *8(%rcx) ~~~ instead of ~~~ cmpq %r15, (%rax,%rdx) je LBB0_2 addq $8, %rcx jmp LBB0_4 LBB0_2: leaq 8(%rdx,%rax), %rcx LBB0_4: callq *(%rcx) ~~~ If I can do this by rewriting the IR, it would be nice, because it hopefully translates to other archite...
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
...; goto %1; } **ASSEMBLY OUTPUT (clang.exe -ggdb0 -O3 -S)** UnpredictableBackedgeTakenCountFunc1(): xor eax, eax ; eax = 0 cmp eax, 4 ; cmpv = (eax == 4) jne .LBB0_2 ; if(cmpv == false) goto LBB0_2 jmp .LBB0_4 ; goto LBB0_4 .LBB0_5: xor ecx, ecx ; ecx = 0 cmp eax, 7 ; cmpv = (ecx == 7) sete cl ; cl = cmpv lea eax, [rax + rcx] ; eax = *(rax + rcx) add eax, 1...
2010 Oct 07
0
[LLVMdev] [Q] x86 peephole deficiency
...Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz > # in Loop: Header=BB0_1 > Depth=1 > cmpl $2, %edi > je .LBB0_6 > # BB#3: # %nz.non-middle >...
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
...8/dbuild/bin/llc .cfi_startproc # BB#0: # %entry pushq %rax .cfi_def_cfa_offset 16 movl $i, %eax .p2align 4, 0x90 .LBB0_1: # %while.cond # =>This Inner Loop Header: Depth=1 cmpq %rax, %rsi ja .LBB0_4 # BB#2: # %while.body # in Loop: Header=BB0_1 Depth=1 movq (%rdi), %rcx movq %rcx, (%rsi) movq 8(%rdi), %rcx movq %rcx, (%rsi) addq $6, %rdi addq $6, %rsi cmpq %rdx, %rsi jb .LBB0_1 # BB#3: # %...
2017 Dec 19
2
A code layout related side-effect introduced by rL318299
...>> pushq %rax >> .cfi_def_cfa_offset 16 >> movl $i, %eax >> .p2align 4, 0x90 >> .LBB0_1: # %while.cond >> # =>This Inner Loop Header: >> Depth=1 >> cmpq %rax, %rsi >> ja .LBB0_4 >> # BB#2: # %while.body >> # in Loop: Header=BB0_1 Depth=1 >> movq (%rdi), %rcx >> movq %rcx, (%rsi) >> movq 8(%rdi), %rcx >> movq %rcx, (%rsi) >> addq $6, %rdi >> addq $6, %rs...
2017 May 30
3
[atomics][AArch64] Possible bug in cmpxchg lowering
...ew _*release acquire*_ %v1 = extractvalue { i32, i1 } %v0, 1 ret i1 %v1 } to the equivalent of the following on AArch64: _*ldxr w8, [x0]*_ cmp w8, w1 b.ne .LBB0_3 // BB#1: // %cmpxchg.trystore stlxr w8, w2, [x0] cbz w8, .LBB0_4 // BB#2: // %cmpxchg.failure mov w0, wzr ret .LBB0_3: // %cmpxchg.nostore clrex mov w0, wzr ret .LBB0_4: orr w0, wzr, #0x1 ret GCC instead generates a ldaxr for the initial load, which seems...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx, -8 mov eax, edx .LBB...
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1 ldr w11, [x10, :lo12:globalvar] <===== load inside loop add w11, w11, w1 str w11, [x10, :lo12:globalvar] &lt...
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Adding separate "s" instructions is
2016 Jun 28
2
Instruction selection problem with type i64 - mistaken as v8i64?
...ody ], [ zeroinitializer, %vector.body.preheader ] The ASM code generated from it is the following: LBB0_3: // %vector.body.preheader REGVEC0 = 0 mov r0, 0 std -48(r10), r0 std -128(r10), REGVEC0 jmp LBB0_4 LBB0_4: // %vector.body ldd REGVEC0, -128(r10) ldd r0, -48(r10) I am surprised that the BPF scalar instructions ldd and std use vector register REGVEC0, which have type v8i64. For example, the TableGen definition of the LOAD inst...
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
...State Consider baseline x86 instructions like the following, which test three conditions and if all pass, loads data from memory and potentially leaks it through some side channel: ``` # %bb.0: # %entry pushq %rax testl %edi, %edi jne .LBB0_4 # %bb.1: # %then1 testl %esi, %esi jne .LBB0_4 # %bb.2: # %then2 testl %edx, %edx je .LBB0_3 .LBB0_4: # %exit popq %rax retq .LBB0_3:...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...w9, w0, lt >> asr w9, w9, #1 >> adrp x10, globalvar >> .LBB0_2: // %for.body >> // =>This Inner Loop Header: Depth=1 >> cmp w8, w9 >> b.hs .LBB0_4 >> // BB#3: // %if.then >> // in Loop: Header=BB0_2 Depth=1 >> ldr w11, [x10, :lo12:globalvar] <===== load inside loop >> add w11, w11, w1 >>...
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...>> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx, byte ptr [rdi] >> shl edx, 24 >>...
2013 Dec 13
0
[LLVMdev] GVNPRE /PRE is not effective
...movl $2147483647, %eax # imm = 0x7FFFFFFF addl phi, %eax cltd idivl %ecx movl %eax, sum movl (%edi,%esi,4), %ecx .LBB0_2: # %if.end leal (,%ecx,4), %eax cmpl $-14, %eax jl .LBB0_4 # BB#3: # %if.then5 movl $2147483647, %eax # imm = 0x7FFFFFFF addl phi, %eax cltd idivl %ecx movl %eax, sum .LBB0_4: # %if.end9 popl %esi popl %edi r...
2010 Oct 04
2
[LLVMdev] missing blocks
...s r0,r0,0 # BB#1: # %if.then call abort oris r0,r0,0 .LBB0_2: # %if.end addi %r12, %r0, 0 addi %r2, %r12, 0 call special_format oris r0,r0,0 subc r0, %r2, %r12 bne .LBB0_4 oris r0,r0,0 b .LBB0_3 oris r0,r0,0 # BB#3: # %if.then3 call abort oris r0,r0,0 .LBB0_4: # %if.end4 addi %r2, %r0, 0 call exit oris r0,r0,0 .Ltmp0: .size...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
...  cmp      w0, #0                 // =0         cinc     w9, w0, lt         asr     w9, w9, #1         adrp    x10, globalvar .LBB0_2:                                // %for.body                                         // =>This Inner Loop Header: Depth=1         cmp      w8, w9         b.hs    .LBB0_4 // BB#3:                                // %if.then                                         //   in Loop: Header=BB0_2 Depth=1         ldr     w11, [x10, :lo12:globalvar]                     <===== load inside loop         add      w11, w11, w1         str     w11, [x10, :lo12:globalvar]       ...