Displaying 20 results from an estimated 84 matches for "lbb0_1".
2018 Apr 04
0
SCEV and LoopStrengthReduction Formulae
> cmpq %rbx, %r14
> jne .LBB0_1
>
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above. We can remove the cmp above if we were to perform
> the following transformation:
Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)?
Is there any mor...
2018 Apr 03
4
SCEV and LoopStrengthReduction Formulae
..., but perhaps this should stand alone as its own pass:
// Example which can be optimized via cmp/jmp fusion.
// clang -O3 -S test.c
extern void g(int);
void f(int *p, long long n) {
do {
g(*p++);
} while (--n);
}
LLVM currently generates the following sequence for x86_64 targets:
LBB0_1:
movl (%r15,%rbx,4), %edi
callq g
addq $1, %rbx
cmpq %rbx, %r14
jne .LBB0_1
LLVM can perform compare-jump fusion, it already does in certain cases, but not
in the case above. We can remove the cmp above if we were to perform
the following transformation:
1.0) Initialize the induction variabl...
2011 Feb 18
2
[LLVMdev] EFLAGS and MVT::Glue
...g 8404. (I've seen the same assertion
failure in some other cases where I have more reason to think that
that's roughly what happened.)
A real example to consider might be code like this:
do {
a[i] -= b[i];
} while (a[i++] >= 0);
I'm currently getting ARM code like this:
.LBB0_1:
ldr r2, [r1], #4
ldr r3, [r0]
sub r2, r3, r2
str r2, [r0], #4
cmp r2, #0
bge .LBB0_1
This could be improved, I think, by getting the subtract to set the
flags instead of comparing with zero, like this:
.LBB0_1:
ldr...
2017 Jun 07
2
[RFC] Optimizing Comparisons Chains
...code when dealing with contiguous
member-by-member structural equality. Consider:
struct A {
bool operator==(const A& o) const { return i == o.i && j == o.j; }
uint32 i;
uint32 j;
};
This generates:
mov eax, dword ptr [rdi]
cmp eax, dword ptr [rsi]
jne .LBB0_1
mov eax, dword ptr [rdi + 4]
cmp eax, dword ptr [rsi + 4]
sete al
ret
.LBB0_1:
xor eax, eax
ret
I’ve been working on an LLVM pass that detects this pattern at IR level and
turns it into a memcmp() call. This generates more efficient code:
mov rax, qword ptr [r...
2015 Mar 23
3
[LLVMdev] Changing The '.' Used to Prefix Labels in Assembly Output
I'm working on an LLVM back end with output to assembly file (.s). I'm
using the ARM assembly printer.
The generated labels (e.g. for a while statement) start with '.' like
.LBB0_1
I would like to change the '.' to something else (specifically $$ if it
matters).
I see a lot of customizability in targetinfo.cpp but not that particular
item.
Where should I be looking?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm....
2012 Oct 02
4
[LLVMdev] interesting possible compiler bug
...vmpb/install/bin/clang loop.c -O2 -S
.file "loop.c"
.text
.globl main
.align 16, 0x90
.type main, at function
main: # @main
.cfi_startproc
# BB#0: # %entry
.align 16, 0x90
.LBB0_1: # %do.body
# =>This Inner Loop Header: Depth=1
jmp .LBB0_1
.Ltmp0:
.size main, .Ltmp0-main
.cfi_endproc
.section ".note.GNU-stack","", at progbits
-------------- next part...
2018 Sep 20
3
Comparing Clang and GCC: only clang stores updated value in each iteration.
...t=1
.text
.file "testfun.i"
.globl b # -- Begin function b
.p2align 4
.type b, at function
b: # @b
# %bb.0: # %entry
lrl %r0, a
.LBB0_1: # %do.body
# =>This Inner Loop Header: Depth=1
cije %r0, 0, .LBB0_3
# %bb.2: # %if.then
# in Loop: Header=BB0_1 Depth=1
ahi ...
2011 Mar 24
2
[LLVMdev] GCC vs. LLVM difference on simple code example
...here is the code produced by llvm-gcc 4.2.1:
.file "foo.c"
.text
.globl foo
.align 16, 0x90
.type foo, at function
foo:
pushl %ebp
movl %esp, %ebp
movl $1, %eax
movl a, %ecx
.align 16, 0x90
.LBB0_1:
* movl b, %edx*
addl (%edx,%eax,4), %ecx
movl %ecx, a
incl %eax
cmpl $100, %eax
jne .LBB0_1
popl %ebp
ret
.Ltmp0:
.size foo, .Ltmp0-foo
.section .note.GNU-stack,"", at progbits...
2011 Feb 07
1
[LLVMdev] Post-inc combining
...cmp r12, r2
mov lr, r12
blt .LBB0_3
, which does not seem to be auto-incrementing, I think.
I wonder what I should do to get loops auto-incing generally, for instance in this simple loop:
for(i=0;i<256;i++)
{
s+=a[i];
}
, which now yields
.LBB0_1: @ %for.body
@ =>This Inner Loop Header: Depth=1
ldr r3, [r0, r2]
add r2, r2, #4
add r1, r3, r1
cmp r2, #1, 22 @ 1024
bne .LBB0_1
, which uses r0 as base addre...
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...%rcx
sarq $63, %rcx
shrq $62, %rcx
addq %r8, %rcx
sarq $2, %rcx
movq %rax, %rdx
shlq $5, %rdx
leaq 16(%r9,%rdx), %rsi
orq $16, %rdx
movq 16(%rsp), %rdi
addq %rdx, %rdi
addq 8(%rsp), %rdx
.align 16, 0x90
.LBB0_1:
vmovaps -16(%rdx), %xmm0
vmovaps (%rdx), %xmm1
vmovaps -16(%rdi), %xmm2
vmovaps (%rdi), %xmm3
vmulps %xmm3, %xmm1, %xmm4
vmulps %xmm2, %xmm1, %xmm1
vfmadd213ss %xmm4, %xmm0, %xmm2
vfmsub213ss %xmm1, %xmm0, %xmm3
vmovaps %xmm2,...
2017 Nov 28
3
storing MBB MCSymbol in custom section
...en only compiling (incompletely via -S),
e.g. $clang calc_pi.c -o calc_pi -S ,
it does compile with the expected outputs and the basic block labels
properly generated in my section. I get a file with proper labels
e.g.
__custom_section:
.long 19
.quad .LBB0_0
.long 1
.quad .LBB0_1
.long 28
.quad .LBB0_2
.long 3
... etc.
TLDR: compiles correctly, will not link successfully to make a binary
because the symbol of the basic block in the .text section doesn't
exist. How do I ensure the Basic Block Symbol does not get destroyed
until my section is read in as...
2011 Apr 15
0
[LLVMdev] Scheduling - WAW Dependencies
...efore
any of the other flag-affecting nodes, such as SU10. One such schedule
is 8-13-12-14-3-11-2-7-10-5-6-9-4-1-0 (which is, incidentally, what my
scheduler produces). However, when we look at the code generated from
such a schedule, we see this:
--------------------------------------------------
.LBB0_1: # %bb
# =>This Inner Loop Header: Depth=1
addq $-123, %rdx
leaq (%rax,%rax), %rsi
imulq %rdx, %rsi
decl %ecx
leaq 1(%rax,%rax), %rax
addq $456, %rsi # imm = 0x1C8
movq %rsi, %rdx
jne .LBB0_1
------------...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...n lowering
>> the IR to machine code. However, the generated assembly doesn't seem to
>> support this assumption :-(
>>
>>
>> main:
>> .cfi_startproc
>> xorl %eax, %eax
>> xorl %esi, %esi
>> .align 16, 0x90
>> .LBB0_1:
>> vmovups (%r8,%rax), %xmm0
>> vaddps (%rcx,%rax), %xmm0, %xmm0
>> vmovups %xmm0, (%rdx,%rax)
>> addq $4, %rsi
>> addq $16, %rax
>> cmpq $61, %rsi
>> jb .LBB0_1
>> retq
>>
>> I played...
2012 Jan 04
1
[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?
...fragile-abi
-fdiagnostics
show-option -fcolor-diagnostics -o test.s -x c test.c
.def _f;
.scl 2;
.type 32;
.endef
.text
.globl _f
.align 16, 0x90
_f: # @f
# BB#0:
movl $-800, %eax # imm = 0xFFFFFFFFFFFFFCE0
movsd _DA, %xmm0
.align 16, 0x90
LBB0_1: # =>This Inner Loop Header: Depth=1
movsd _X+800(%eax), %xmm1
mulsd %xmm0, %xmm1
movsd _Y+800(%eax), %xmm2
subsd %xmm1, %xmm2
movsd %xmm2, _Y+800(%eax)
addl $8, %eax
jne LBB0_1
# BB#2:
xorl %eax, %eax
ret
.data
.globl _DA # @DA
.al...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...addq %r8, %rcx
> sarq $2, %rcx
> movq %rax, %rdx
> shlq $5, %rdx
> leaq 16(%r9,%rdx), %rsi
> orq $16, %rdx
> movq 16(%rsp), %rdi
> addq %rdx, %rdi
> addq 8(%rsp), %rdx
> .align 16, 0x90
> .LBB0_1:
> vmovaps -16(%rdx), %xmm0
> vmovaps (%rdx), %xmm1
> vmovaps -16(%rdi), %xmm2
> vmovaps (%rdi), %xmm3
> vmulps %xmm3, %xmm1, %xmm4
> vmulps %xmm2, %xmm1, %xmm1
> vfmadd213ss %xmm4, %xmm0, %xmm2
> vfmsub213ss %x...
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
...ttached IR with LLVM 3.6
llc -march=x86-64 -o f.S f.ll
it generates an aligned ADDPS with unaligned address. See attached f.S,
here an extract:
addq $12, %r9 # $12 is not a multiple of 4, thus for
xmm0 this is unaligned
xorl %esi, %esi
.align 16, 0x90
.LBB0_1: # %loop2
# =>This Inner Loop Header: Depth=1
movq offset_array3(,%rsi,8), %rdi
movq offset_array2(,%rsi,8), %r10
movss -28(%rax), %xmm0
movss -8(%rax), %xmm1
movss -4...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...gisters get used when lowering the IR to machine code. However, the generated assembly doesn't seem to support this assumption :-(
>>
>>
>> main:
>> .cfi_startproc
>> xorl %eax, %eax
>> xorl %esi, %esi
>> .align 16, 0x90
>> .LBB0_1:
>> vmovups (%r8,%rax), %xmm0
>> vaddps (%rcx,%rax), %xmm0, %xmm0
>> vmovups %xmm0, (%rdx,%rax)
>> addq $4, %rsi
>> addq $16, %rax
>> cmpq $61, %rsi
>> jb .LBB0_1
>> retq
>>
>> I played with -m...
2013 Dec 12
0
[LLVMdev] AVX code gen
...## @f
.cfi_startproc
## BB#0: ## %entry
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
xorl %eax, %eax
.align 4, 0x90
LBB0_1: ## %vector.body
## =>This Inner Loop Header: Depth=1
vmovups (%rdx,%rax,4), %ymm0
vmulps (%rsi,%rax,4), %ymm0, %ymm0
vaddps (%rdi,%rax,4), %ymm0, %ymm0
vmovups %ymm0, (%rdi,%rax,4)...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
..., %rcx
>> movq %rax, %rdx
>> shlq $5, %rdx
>> leaq 16(%r9,%rdx), %rsi
>> orq $16, %rdx
>> movq 16(%rsp), %rdi
>> addq %rdx, %rdi
>> addq 8(%rsp), %rdx
>> .align 16, 0x90
>> .LBB0_1:
>> vmovaps -16(%rdx), %xmm0
>> vmovaps (%rdx), %xmm1
>> vmovaps -16(%rdi), %xmm2
>> vmovaps (%rdi), %xmm3
>> vmulps %xmm3, %xmm1, %xmm4
>> vmulps %xmm2, %xmm1, %xmm1
>> vfmadd213ss %xmm4, %xmm0...
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...cmovaq %rcx, %rax
movq %rdi, %rsi
notq %rsi
addq %rax, %rsi
shrq $2, %rsi
incq %rsi
xorl %edx, %edx
movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8
andq %rsi, %rax
pxor %xmm0, %xmm0
je .LBB0_1
# BB#2: # %vector.body.preheader
leaq (%rdi,%rax,4), %r8
addq $16, %rdi
movq %rsi, %rdx
andq $-8, %rdx
pxor %xmm0, %xmm0
pxor %xmm1, %xmm1
.align 16, 0x90
.LBB0_3: # %vector...