thr3ads.net - search: "lbb0

2018 Apr 04

0

SCEV and LoopStrengthReduction Formulae

> cmpq %rbx, %r14 > jne .LBB0_1 > > LLVM can perform compare-jump fusion, it already does in certain cases, but > not in the case above. We can remove the cmp above if we were to perform > the following transformation: Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)? Is there any mor...

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

4

SCEV and LoopStrengthReduction Formulae

..., but perhaps this should stand alone as its own pass: // Example which can be optimized via cmp/jmp fusion. // clang -O3 -S test.c extern void g(int); void f(int *p, long long n) { do { g(*p++); } while (--n); } LLVM currently generates the following sequence for x86_64 targets: LBB0_1: movl (%r15,%rbx,4), %edi callq g addq $1, %rbx cmpq %rbx, %r14 jne .LBB0_1 LLVM can perform compare-jump fusion, it already does in certain cases, but not in the case above. We can remove the cmp above if we were to perform the following transformation: 1.0) Initialize the induction variabl...

[LLVMdev] EFLAGS and MVT::Glue

2011 Feb 18

2

[LLVMdev] EFLAGS and MVT::Glue

...g 8404. (I've seen the same assertion failure in some other cases where I have more reason to think that that's roughly what happened.) A real example to consider might be code like this: do { a[i] -= b[i]; } while (a[i++] >= 0); I'm currently getting ARM code like this: .LBB0_1: ldr r2, [r1], #4 ldr r3, [r0] sub r2, r3, r2 str r2, [r0], #4 cmp r2, #0 bge .LBB0_1 This could be improved, I think, by getting the subtract to set the flags instead of comparing with zero, like this: .LBB0_1: ldr...

[RFC] Optimizing Comparisons Chains

2017 Jun 07

2

[RFC] Optimizing Comparisons Chains

...code when dealing with contiguous member-by-member structural equality. Consider: struct A { bool operator==(const A& o) const { return i == o.i && j == o.j; } uint32 i; uint32 j; }; This generates: mov eax, dword ptr [rdi] cmp eax, dword ptr [rsi] jne .LBB0_1 mov eax, dword ptr [rdi + 4] cmp eax, dword ptr [rsi + 4] sete al ret .LBB0_1: xor eax, eax ret I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code: mov rax, qword ptr [r...

[LLVMdev] Changing The '.' Used to Prefix Labels in Assembly Output

2015 Mar 23

3

[LLVMdev] Changing The '.' Used to Prefix Labels in Assembly Output

I'm working on an LLVM back end with output to assembly file (.s). I'm using the ARM assembly printer. The generated labels (e.g. for a while statement) start with '.' like .LBB0_1 I would like to change the '.' to something else (specifically $$ if it matters). I see a lot of customizability in targetinfo.cpp but not that particular item. Where should I be looking? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm....

[LLVMdev] interesting possible compiler bug

2012 Oct 02

4

[LLVMdev] interesting possible compiler bug

...vmpb/install/bin/clang loop.c -O2 -S .file "loop.c" .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: # %entry .align 16, 0x90 .LBB0_1: # %do.body # =>This Inner Loop Header: Depth=1 jmp .LBB0_1 .Ltmp0: .size main, .Ltmp0-main .cfi_endproc .section ".note.GNU-stack","", at progbits -------------- next part...

Comparing Clang and GCC: only clang stores updated value in each iteration.

2018 Sep 20

3

Comparing Clang and GCC: only clang stores updated value in each iteration.

...t=1 .text .file "testfun.i" .globl b # -- Begin function b .p2align 4 .type b, at function b: # @b # %bb.0: # %entry lrl %r0, a .LBB0_1: # %do.body # =>This Inner Loop Header: Depth=1 cije %r0, 0, .LBB0_3 # %bb.2: # %if.then # in Loop: Header=BB0_1 Depth=1 ahi ...

[LLVMdev] GCC vs. LLVM difference on simple code example

2011 Mar 24

2

[LLVMdev] GCC vs. LLVM difference on simple code example

...here is the code produced by llvm-gcc 4.2.1: .file "foo.c" .text .globl foo .align 16, 0x90 .type foo, at function foo: pushl %ebp movl %esp, %ebp movl $1, %eax movl a, %ecx .align 16, 0x90 .LBB0_1: * movl b, %edx* addl (%edx,%eax,4), %ecx movl %ecx, a incl %eax cmpl $100, %eax jne .LBB0_1 popl %ebp ret .Ltmp0: .size foo, .Ltmp0-foo .section .note.GNU-stack,"", at progbits...

[LLVMdev] Post-inc combining

2011 Feb 07

1

[LLVMdev] Post-inc combining

...cmp r12, r2 mov lr, r12 blt .LBB0_3 , which does not seem to be auto-incrementing, I think. I wonder what I should do to get loops auto-incing generally, for instance in this simple loop: for(i=0;i<256;i++) { s+=a[i]; } , which now yields .LBB0_1: @ %for.body @ =>This Inner Loop Header: Depth=1 ldr r3, [r0, r2] add r2, r2, #4 add r1, r3, r1 cmp r2, #1, 22 @ 1024 bne .LBB0_1 , which uses r0 as base addre...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

2

avx512 JIT backend generates wrong code on <4 x float>

...%rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx sarq $2, %rcx movq %rax, %rdx shlq $5, %rdx leaq 16(%r9,%rdx), %rsi orq $16, %rdx movq 16(%rsp), %rdi addq %rdx, %rdi addq 8(%rsp), %rdx .align 16, 0x90 .LBB0_1: vmovaps -16(%rdx), %xmm0 vmovaps (%rdx), %xmm1 vmovaps -16(%rdi), %xmm2 vmovaps (%rdi), %xmm3 vmulps %xmm3, %xmm1, %xmm4 vmulps %xmm2, %xmm1, %xmm1 vfmadd213ss %xmm4, %xmm0, %xmm2 vfmsub213ss %xmm1, %xmm0, %xmm3 vmovaps %xmm2,...

storing MBB MCSymbol in custom section

2017 Nov 28

3

storing MBB MCSymbol in custom section

...en only compiling (incompletely via -S), e.g. $clang calc_pi.c -o calc_pi -S , it does compile with the expected outputs and the basic block labels properly generated in my section. I get a file with proper labels e.g. __custom_section: .long 19 .quad .LBB0_0 .long 1 .quad .LBB0_1 .long 28 .quad .LBB0_2 .long 3 ... etc. TLDR: compiles correctly, will not link successfully to make a binary because the symbol of the basic block in the .text section doesn't exist. How do I ensure the Basic Block Symbol does not get destroyed until my section is read in as...

[LLVMdev] Scheduling - WAW Dependencies

2011 Apr 15

0

[LLVMdev] Scheduling - WAW Dependencies

...efore any of the other flag-affecting nodes, such as SU10. One such schedule is 8-13-12-14-3-11-2-7-10-5-6-9-4-1-0 (which is, incidentally, what my scheduler produces). However, when we look at the code generated from such a schedule, we see this: -------------------------------------------------- .LBB0_1: # %bb # =>This Inner Loop Header: Depth=1 addq $-123, %rdx leaq (%rax,%rax), %rsi imulq %rdx, %rsi decl %ecx leaq 1(%rax,%rax), %rax addq $456, %rsi # imm = 0x1C8 movq %rsi, %rdx jne .LBB0_1 ------------...

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

3

[LLVMdev] SLP vectorizer on AVX feature

...n lowering >> the IR to machine code. However, the generated assembly doesn't seem to >> support this assumption :-( >> >> >> main: >> .cfi_startproc >> xorl %eax, %eax >> xorl %esi, %esi >> .align 16, 0x90 >> .LBB0_1: >> vmovups (%r8,%rax), %xmm0 >> vaddps (%rcx,%rax), %xmm0, %xmm0 >> vmovups %xmm0, (%rdx,%rax) >> addq $4, %rsi >> addq $16, %rax >> cmpq $61, %rsi >> jb .LBB0_1 >> retq >> >> I played...

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

2012 Jan 04

1

[LLVMdev] How can I compile a c source file to use SSE2 Data Movement Instructions?

...fragile-abi -fdiagnostics show-option -fcolor-diagnostics -o test.s -x c test.c .def _f; .scl 2; .type 32; .endef .text .globl _f .align 16, 0x90 _f: # @f # BB#0: movl $-800, %eax # imm = 0xFFFFFFFFFFFFFCE0 movsd _DA, %xmm0 .align 16, 0x90 LBB0_1: # =>This Inner Loop Header: Depth=1 movsd _X+800(%eax), %xmm1 mulsd %xmm0, %xmm1 movsd _Y+800(%eax), %xmm2 subsd %xmm1, %xmm2 movsd %xmm2, _Y+800(%eax) addl $8, %eax jne LBB0_1 # BB#2: xorl %eax, %eax ret .data .globl _DA # @DA .al...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

0

avx512 JIT backend generates wrong code on <4 x float>

...addq %r8, %rcx > sarq $2, %rcx > movq %rax, %rdx > shlq $5, %rdx > leaq 16(%r9,%rdx), %rsi > orq $16, %rdx > movq 16(%rsp), %rdi > addq %rdx, %rdi > addq 8(%rsp), %rdx > .align 16, 0x90 > .LBB0_1: > vmovaps -16(%rdx), %xmm0 > vmovaps (%rdx), %xmm1 > vmovaps -16(%rdi), %xmm2 > vmovaps (%rdi), %xmm3 > vmulps %xmm3, %xmm1, %xmm4 > vmulps %xmm2, %xmm1, %xmm1 > vfmadd213ss %xmm4, %xmm0, %xmm2 > vfmsub213ss %x...

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

2

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

...ttached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2 # =>This Inner Loop Header: Depth=1 movq offset_array3(,%rsi,8), %rdi movq offset_array2(,%rsi,8), %r10 movss -28(%rax), %xmm0 movss -8(%rax), %xmm1 movss -4...

[LLVMdev] SLP vectorizer on AVX feature

2015 Jul 01

3

[LLVMdev] SLP vectorizer on AVX feature

...gisters get used when lowering the IR to machine code. However, the generated assembly doesn't seem to support this assumption :-( >> >> >> main: >> .cfi_startproc >> xorl %eax, %eax >> xorl %esi, %esi >> .align 16, 0x90 >> .LBB0_1: >> vmovups (%r8,%rax), %xmm0 >> vaddps (%rcx,%rax), %xmm0, %xmm0 >> vmovups %xmm0, (%rdx,%rax) >> addq $4, %rsi >> addq $16, %rax >> cmpq $61, %rsi >> jb .LBB0_1 >> retq >> >> I played with -m...

[LLVMdev] AVX code gen

2013 Dec 12

0

[LLVMdev] AVX code gen

...## @f .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp xorl %eax, %eax .align 4, 0x90 LBB0_1: ## %vector.body ## =>This Inner Loop Header: Depth=1 vmovups (%rdx,%rax,4), %ymm0 vmulps (%rsi,%rax,4), %ymm0, %ymm0 vaddps (%rdi,%rax,4), %ymm0, %ymm0 vmovups %ymm0, (%rdi,%rax,4)...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

1

avx512 JIT backend generates wrong code on <4 x float>

..., %rcx >> movq %rax, %rdx >> shlq $5, %rdx >> leaq 16(%r9,%rdx), %rsi >> orq $16, %rdx >> movq 16(%rsp), %rdi >> addq %rdx, %rdi >> addq 8(%rsp), %rdx >> .align 16, 0x90 >> .LBB0_1: >> vmovaps -16(%rdx), %xmm0 >> vmovaps (%rdx), %xmm1 >> vmovaps -16(%rdi), %xmm2 >> vmovaps (%rdi), %xmm3 >> vmulps %xmm3, %xmm1, %xmm4 >> vmulps %xmm2, %xmm1, %xmm1 >> vfmadd213ss %xmm4, %xmm0...

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq (%rdi,%rax,4), %r8 addq $16, %rdi movq %rsi, %rdx andq $-8, %rdx pxor %xmm0, %xmm0 pxor %xmm1, %xmm1 .align 16, 0x90 .LBB0_3: # %vector...

search for: lbb0_1