Displaying 20 results from an estimated 36 matches for "lbb0_4".
Did you mean:
lbb0_1
2015 Oct 27
4
How can I tell llvm, that a branch is preferred ?
...t;branch"
or "switch". And __buildin_expect does nothing, that I am sure of.
Unfortunately llvm has this knack for ordering my one most crucial part
of code exactly the opposite I want to, it does: (x86_64)
cmpq %r15, (%rax,%rdx)
jne LBB0_3
Ltmp18:
leaq 8(%rax,%rdx), %rcx
jmp LBB0_4
LBB0_3:
addq $8, %rcx
LBB0_4:
when I want,
cmpq %r15, (%rax,%rdx)
jeq LBB0_3
addq $8, %rcx
jmp LBB0_4
LBB0_3:
leaq 8(%rax,%rdx), %rcx
LBB0_4:
since that saves me executing a jump 99.9% of the time. Is there
anything I can do ?
Ciao
Nat!
2010 Oct 07
2
[LLVMdev] [Q] x86 peephole deficiency
Hi all,
I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125)
and now I am running into a deficiency of the x86
peephole optimizer (or jump-threader?). Here is what I get:
andl $3, %edi
je .LBB0_4
# BB#2: # %nz
# in Loop: Header=BB0_1
Depth=1
cmpl $2, %edi
je .LBB0_6
# BB#3: # %nz.non-middle
# in Loop: Header=BB0_1...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...ere is the assembler code and the IR that produced it.
Relevant LLVM generated x86_64 assembler portion with -Os
~~~
testq %r12, %r12
je LBB0_5
## BB#1:
movq -8(%r12), %rcx
movq (%rcx), %rax
movq -8(%rax), %rdx
andq %r15, %rdx
cmpq %r15, (%rax,%rdx)
je LBB0_2
## BB#3:
addq $8, %rcx
jmp LBB0_4
LBB0_2:
leaq 8(%rdx,%rax), %rcx
LBB0_4:
movq %r12, %rdi
movq %r15, %rsi
movq %r14, %rdx
callq *(%rcx)
movq %rax, %rbx
LBB0_5:
~~~
Better/tighter assembler code would be (saves 2 instructions, one jump less)
~~~
testq %r12, %r12
je LBB0_5
movq -8(%r12), %rcx
movq (%rcx), %rax
movq -8(%r...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...e case, you might find this document useful:
> http://llvm.org/docs/Frontend/PerformanceTips.html
Maybe yes, if it would be sensible to rewrite the IR, which I am wondering, if that is a good/useful idea ? I don't know.
I basically need to get LLVM to emit
~~~
cmpq %r15, (%rax,%rdx)
jne LBB0_4
leaq 0(%rdx,%rax), %rcx
LBB0_4:
callq *8(%rcx)
~~~
instead of
~~~
cmpq %r15, (%rax,%rdx)
je LBB0_2
addq $8, %rcx
jmp LBB0_4
LBB0_2:
leaq 8(%rdx,%rax), %rcx
LBB0_4:
callq *(%rcx)
~~~
If I can do this by rewriting the IR, it would be nice, because it hopefully translates to other archite...
2017 Nov 20
2
Nowaday Scalar Evolution's Problem.
...; goto %1;
}
**ASSEMBLY OUTPUT (clang.exe -ggdb0 -O3 -S)**
UnpredictableBackedgeTakenCountFunc1():
xor eax, eax ; eax = 0
cmp eax, 4 ; cmpv = (eax == 4)
jne .LBB0_2 ; if(cmpv == false) goto LBB0_2
jmp .LBB0_4 ; goto LBB0_4
.LBB0_5:
xor ecx, ecx ; ecx = 0
cmp eax, 7 ; cmpv = (ecx == 7)
sete cl ; cl = cmpv
lea eax, [rax + rcx] ; eax = *(rax + rcx)
add eax, 1...
2010 Oct 07
0
[LLVMdev] [Q] x86 peephole deficiency
...Gabor Greif wrote:
> Hi all,
>
> I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125)
> and now I am running into a deficiency of the x86
> peephole optimizer (or jump-threader?). Here is what I get:
>
>
> andl $3, %edi
> je .LBB0_4
> # BB#2: # %nz
> # in Loop: Header=BB0_1
> Depth=1
> cmpl $2, %edi
> je .LBB0_6
> # BB#3: # %nz.non-middle
>...
2017 Dec 19
4
A code layout related side-effect introduced by rL318299
...8/dbuild/bin/llc
.cfi_startproc
# BB#0: # %entry
pushq %rax
.cfi_def_cfa_offset 16
movl $i, %eax
.p2align 4, 0x90
.LBB0_1: # %while.cond
# =>This Inner Loop Header: Depth=1
cmpq %rax, %rsi
ja .LBB0_4
# BB#2: # %while.body
# in Loop: Header=BB0_1 Depth=1
movq (%rdi), %rcx
movq %rcx, (%rsi)
movq 8(%rdi), %rcx
movq %rcx, (%rsi)
addq $6, %rdi
addq $6, %rsi
cmpq %rdx, %rsi
jb .LBB0_1
# BB#3: # %...
2017 Dec 19
2
A code layout related side-effect introduced by rL318299
...>> pushq %rax
>> .cfi_def_cfa_offset 16
>> movl $i, %eax
>> .p2align 4, 0x90
>> .LBB0_1: # %while.cond
>> # =>This Inner Loop Header:
>> Depth=1
>> cmpq %rax, %rsi
>> ja .LBB0_4
>> # BB#2: # %while.body
>> # in Loop: Header=BB0_1 Depth=1
>> movq (%rdi), %rcx
>> movq %rcx, (%rsi)
>> movq 8(%rdi), %rcx
>> movq %rcx, (%rsi)
>> addq $6, %rdi
>> addq $6, %rs...
2017 May 30
3
[atomics][AArch64] Possible bug in cmpxchg lowering
...ew _*release
acquire*_
%v1 = extractvalue { i32, i1 } %v0, 1
ret i1 %v1
}
to the equivalent of the following on AArch64:
_*ldxr w8, [x0]*_
cmp w8, w1
b.ne .LBB0_3
// BB#1: // %cmpxchg.trystore
stlxr w8, w2, [x0]
cbz w8, .LBB0_4
// BB#2: // %cmpxchg.failure
mov w0, wzr
ret
.LBB0_3: // %cmpxchg.nostore
clrex
mov w0, wzr
ret
.LBB0_4:
orr w0, wzr, #0x1
ret
GCC instead generates a ldaxr for the initial load, which seems...
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...crc >>= 1; // rather poor!
}
return ~crc;
}
See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2)
crc32be: # @crc32be
xor eax, eax
test esi, esi
jne .LBB0_2
jmp .LBB0_5
.LBB0_4: # in Loop: Header=BB0_2 Depth=1
add rdi, 1
test esi, esi
je .LBB0_5
.LBB0_2: # =>This Loop Header: Depth=1
add esi, -1
movzx edx, byte ptr [rdi]
shl edx, 24
xor edx, eax
mov ecx, -8
mov eax, edx
.LBB...
2014 Sep 02
3
[LLVMdev] LICM promoting memory to scalar
...cmp w0, #0 // =0
cinc w9, w0, lt
asr w9, w9, #1
adrp x10, globalvar
.LBB0_2: // %for.body
// =>This Inner Loop Header: Depth=1
cmp w8, w9
b.hs .LBB0_4
// BB#3: // %if.then
// in Loop: Header=BB0_2 Depth=1
ldr w11, [x10, :lo12:globalvar] <===== load inside loop
add w11, w11, w1
str w11, [x10, :lo12:globalvar] <...
2011 Feb 18
0
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote:
> Hello everyone,
>
> I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls".
> Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched.
Adding separate "s" instructions is
2016 Jun 28
2
Instruction selection problem with type i64 - mistaken as v8i64?
...ody ], [ zeroinitializer,
%vector.body.preheader ]
The ASM code generated from it is the following:
LBB0_3: // %vector.body.preheader
REGVEC0 = 0
mov r0, 0
std -48(r10), r0
std -128(r10), REGVEC0
jmp LBB0_4
LBB0_4: // %vector.body
ldd REGVEC0, -128(r10)
ldd r0, -48(r10)
I am surprised that the BPF scalar instructions ldd and std use vector register
REGVEC0, which have type v8i64.
For example, the TableGen definition of the LOAD inst...
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
...State
Consider baseline x86 instructions like the following, which test three
conditions and if all pass, loads data from memory and potentially leaks it
through some side channel:
```
# %bb.0: # %entry
pushq %rax
testl %edi, %edi
jne .LBB0_4
# %bb.1: # %then1
testl %esi, %esi
jne .LBB0_4
# %bb.2: # %then2
testl %edx, %edx
je .LBB0_3
.LBB0_4: # %exit
popq %rax
retq
.LBB0_3:...
2011 Feb 18
2
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone,
I've added the "S" suffixed versions of ARM and Thumb2 instructions to
tablegen. Those are, for example, "movs" or "muls".
Of course, some instructions have already had their twins, such as add/adds,
and I leaved them untouched.
Besides, I propose the codegen optimization based on them, which removes the
redundant comparison in patterns like
orr
2014 Sep 02
2
[LLVMdev] LICM promoting memory to scalar
...w9, w0, lt
>> asr w9, w9, #1
>> adrp x10, globalvar
>> .LBB0_2: // %for.body
>> // =>This Inner Loop Header: Depth=1
>> cmp w8, w9
>> b.hs .LBB0_4
>> // BB#3: // %if.then
>> // in Loop: Header=BB0_2 Depth=1
>> ldr w11, [x10, :lo12:globalvar] <===== load inside loop
>> add w11, w11, w1
>>...
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
...>>
>> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm>
>> (-O2)
>>
>> crc32be: # @crc32be
>> xor eax, eax
>> test esi, esi
>> jne .LBB0_2
>> jmp .LBB0_5
>> .LBB0_4: # in Loop: Header=BB0_2 Depth=1
>> add rdi, 1
>> test esi, esi
>> je .LBB0_5
>> .LBB0_2: # =>This Loop Header: Depth=1
>> add esi, -1
>> movzx edx, byte ptr [rdi]
>> shl edx, 24
>>...
2013 Dec 13
0
[LLVMdev] GVNPRE /PRE is not effective
...movl $2147483647, %eax # imm = 0x7FFFFFFF
addl phi, %eax
cltd
idivl %ecx
movl %eax, sum
movl (%edi,%esi,4), %ecx
.LBB0_2: # %if.end
leal (,%ecx,4), %eax
cmpl $-14, %eax
jl .LBB0_4
# BB#3: # %if.then5
movl $2147483647, %eax # imm = 0x7FFFFFFF
addl phi, %eax
cltd
idivl %ecx
movl %eax, sum
.LBB0_4: # %if.end9
popl %esi
popl %edi
r...
2010 Oct 04
2
[LLVMdev] missing blocks
...s r0,r0,0
# BB#1: # %if.then
call abort
oris r0,r0,0
.LBB0_2: # %if.end
addi %r12, %r0, 0
addi %r2, %r12, 0
call special_format
oris r0,r0,0
subc r0, %r2, %r12
bne .LBB0_4
oris r0,r0,0
b .LBB0_3
oris r0,r0,0
# BB#3: # %if.then3
call abort
oris r0,r0,0
.LBB0_4: # %if.end4
addi %r2, %r0, 0
call exit
oris r0,r0,0
.Ltmp0:
.size...
2014 Sep 03
3
[LLVMdev] LICM promoting memory to scalar
... cmp w0, #0 // =0
cinc w9, w0, lt
asr w9, w9, #1
adrp x10, globalvar
.LBB0_2: // %for.body
// =>This Inner Loop Header: Depth=1
cmp w8, w9
b.hs .LBB0_4
// BB#3: // %if.then
// in Loop: Header=BB0_2 Depth=1
ldr w11, [x10, :lo12:globalvar] <===== load
inside loop
add w11, w11, w1
str w11, [x10, :lo12:globalvar] ...