thr3ads.net - search: "bb0

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

3

[LLVMdev] Is va_arg correct on Mips backend?

...4 addu $gp, $2, $25 sw $7, 76($sp) sw $6, 72($sp) sw $5, 68($sp) lw $3, %got(__stack_chk_guard)($gp) lw $1, 0($3) sw $1, 56($sp) sw $4, 52($sp) sw $zero, 48($sp) // i sw $zero, 44($sp) // val sw $zero, 40($sp) // sum addiu $1, $sp, 68 sw $1, 16($sp) // arg_ptr1 sw $zero, 48($sp) b $BB0_2 addiu $2, $zero, 40 $BB0_1: # in Loop: Header=BB0_2 Depth=1 lw $1, 0($4) // $1 = *arg_ptr sw $1, 44($sp) // val lw $4, 40($sp) // sum addu $1, $4, $1 sw $1, 40($sp) // sum += val lw $1, 48($sp) addiu $1, $1, 1 sw $1, 48($sp) $BB0_2:...

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

0

[LLVMdev] Is va_arg correct on Mips backend?

...2($sp) > sw $5, 68($sp) > lw $3, %got(__stack_chk_guard)($gp) > lw $1, 0($3) > sw $1, 56($sp) > sw $4, 52($sp) > sw $zero, 48($sp) // i > sw $zero, 44($sp) // val > sw $zero, 40($sp) // sum > addiu $1, $sp, 68 > sw $1, 16($sp) // arg_ptr1 > sw $zero, 48($sp) > b $BB0_2 > addiu $2, $zero, 40 > $BB0_1: # in Loop: Header=BB0_2 Depth=1 > lw $1, 0($4) // $1 = *arg_ptr > sw $1, 44($sp) // val > lw $4, 40($sp) // sum > addu $1, $4, $1 > sw $1, 40($sp) // sum += val > lw $1, 48($sp) > addiu $1, $1, 1 > sw $1...

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

3

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

...m is not restricted to the NVPTX64 target. Below is a reduced example: __attribute__((global)) void foo(int n, int *output) { for (int i = 0; i < n; i += 3) { output[i] = i * i; } } Without widening, the loop body in the PTX (a low-level assembly-like language generated by NVPTX64) is: BB0_2: // =>This Inner Loop Header: Depth=1 mul.lo.s32 %r5, %r6, %r6; st.u32 [%rd4], %r5; add.s32 %r6, %r6, 3; add.s64 %rd4, %rd4, 12; setp.lt.s32 %p2, %r6, %r3; @%p2 bra BB0_2; in whi...

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

0

[LLVMdev] Is va_arg correct on Mips backend?

Which part of the generated code do you think is not correct? Could you be more specific? I compiled this program with clang and ran it on a mips board. It returns the expected result (21). On Tue, Feb 19, 2013 at 4:15 AM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I check the Mips backend for the following C code fragment compile result. > It seems not correct. Is it my

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

2

[LLVMdev] Is va_arg correct on Mips backend?

I check the Mips backend for the following C code fragment compile result. It seems not correct. Is it my misunderstand or it's a bug. //ch8_3.cpp #include <stdarg.h> int sum_i(int amount, ...) { int i = 0; int val = 0; int sum = 0; va_list vl; va_start(vl, amount); for (i = 0; i < amount; i++) { val = va_arg(vl, int); sum += val; } va_end(vl);

[LLVMdev] Help with a Microblaze code generation problem.

2013 Oct 03

1

[LLVMdev] Help with a Microblaze code generation problem.

...8 swi r3, r19, 24 swi r0, r19, 28 lwi r4, r19, 16 xor r3, r4, r3 lwi r4, r19, 20 or r3, r4, r3 addik r4, r0, 0 addik r5, r0, 1 swi r5, r19, 32 beqid r3, ($BB0_2) swi r4, r19, 36 lwi r3, r19, 36 swi r3, r19, 32 $BB0_2: lwi r3, r19, 32 add r1, r19, r0 lwi r19, r1, 4 rtsd r15, 8 addik r1, r1, 40 .end main Which is very similar to t...

[LLVMdev] Branch delay slots broken.

2010 Dec 14

2

[LLVMdev] Branch delay slots broken.

...this snippit: while (n--) *s++ = (char) c; I get this (for the Microblaze): swi r19, r1, 0 add r3, r0, r0 cmp r3, r3, r7 beqid r3, ($BB0_3) brid ($BB0_1) add r19, r1, r0 add r3, r5, r0 $BB0_2: addi r4, r3, 1 addi r7, r7, -1 add r8, r0, r0 sbi r6, r3, 0 cmp r8, r8, r7 bneid r8, ($BB0_2) brid ($BB0_3) add r3, r4, r0 $BB0_3: Notice that the label $BB0_1 is missing. If I disab...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...// these 4 lines is crc >>= 1; // rather poor! } return ~crc; } See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> (-O2) crc32be: # @crc32be xor eax, eax test esi, esi jne .LBB0_2 jmp .LBB0_5 .LBB0_4: # in Loop: Header=BB0_2 Depth=1 add rdi, 1 test esi, esi je .LBB0_5 .LBB0_2: # =>This Loop Header: Depth=1 add esi, -1 movzx edx, byte ptr [rdi] shl edx, 24 xor edx, eax mov ecx,...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

3

[LLVMdev] LICM promoting memory to scalar

...oii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

2

KNL Assembly Code for Matrix Multiplication

...+ .LCPI0_6] >>>>> .p2align 4, 0x90 >>>>> .LBB0_1: # %.preheader26 >>>>> # =>This Loop Header: Depth=1 >>>>> # Child Loop BB0_2 Depth 2 >>>>> # Child Loop BB0_3 Depth >>>>> 3 >>>>> # Child Loop BB0_5 Depth >>>>> 3 >>>>> xor r11d, r11d >>>>> .p2a...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 02

2

[LLVMdev] LICM promoting memory to scalar

...bz w0, .LBB0_5 >> // BB#1: // %for.body.lr.ph >> mov w8, wzr >> cmp w0, #0 // =0 >> cinc w9, w0, lt >> asr w9, w9, #1 >> adrp x10, globalvar >> .LBB0_2: // %for.body >> // =>This Inner Loop Header: Depth=1 >> cmp w8, w9 >> b.hs .LBB0_4 >> // BB#3: // %if.then >>...

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...;> } >> return ~crc; >> } >> >> See <https://godbolt.org/z/eYJeWt> (-O1) and <https://godbolt.org/z/zeExHm> >> (-O2) >> >> crc32be: # @crc32be >> xor eax, eax >> test esi, esi >> jne .LBB0_2 >> jmp .LBB0_5 >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> add rdi, 1 >> test esi, esi >> je .LBB0_5 >> .LBB0_2: # =>This Loop Header: Depth=1 >> add esi, -1 >> movzx edx, byte ptr [...

[LLVMdev] LICM promoting memory to scalar

2014 Sep 03

3

[LLVMdev] LICM promoting memory to scalar

...oii // BB#0: // %entry cbz w0, .LBB0_5 // BB#1: // %for.body.lr.ph mov w8, wzr cmp w0, #0 // =0 cinc w9, w0, lt asr w9, w9, #1 adrp x10, globalvar .LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 cmp w8, w9 b.hs .LBB0_4 // BB#3: // %if.then // in Loop: Header=BB0_2 Depth=1...

How to remove memcpy

2016 Oct 15

3

How to remove memcpy

...l16(memcpy)($17) addiu $16, $fp, 1248 move $4, $16 addiu $6, $zero, 400 jalr $25 move $gp, $17 lw $1, %got($main.b)($17) addiu $5, $1, %lo($main.b) lw $25, %call16(memcpy)($17) addiu $17, $fp, 848 move $4, $17 jalr $25 addiu $6, $zero, 400 sw $zero, 820($fp) sw $zero, 844($fp) addiu $2, $fp, 420 b $BB0_2 addiu $3, $fp, 20 $BB0_1: -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161015/2da75a6d/attachment.html>

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 28

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

...t; See <https://godbolt.org/z/eYJeWt> (-O1) and < >> https://godbolt.org/z/zeExHm> >> >> (-O2) >> >> >> >> crc32be: # @crc32be >> >> xor eax, eax >> >> test esi, esi >> >> jne .LBB0_2 >> >> jmp .LBB0_5 >> >> .LBB0_4: # in Loop: Header=BB0_2 Depth=1 >> >> add rdi, 1 >> >> test esi, esi >> >> je .LBB0_5 >> >> .LBB0_2: # =>This Loop Header: Depth=1 >> >>...

search for: bb0_2