thr3ads.net - similar to: "[LLVMdev] Branch delay slots broken."

Displaying 20 results from an estimated 500 matches similar to: "[LLVMdev] Branch delay slots broken."

[LLVMdev] Help with a Microblaze code generation problem.

2013 Oct 03

[LLVMdev] Help with a Microblaze code generation problem.

Sorry if this is a duplicate: I tried to send it last night and it didn't go through. I'm trimming some text to see if it helps. I have a simple program that fails on the Microblaze: int main() { unsigned long long x, y; x = 100; y = 0x8000000000000000ULL; return !(x > y); } As you can see, the test case compares two unsigned long long values. To try to track

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

On Dec 14, 2010, at 3:46 PM, Richard Pennington wrote: > Notice that the label $BB0_1 is missing. If I disable filling in the > branch delay slots, I get: Is this with the latest SVN HEAD version of LLVM or some other version? The delay slot filler and many other things have been updated for the Microblaze backend. In particular, the commit r120095 for the MBlaze backend fixed some issues

[LLVMdev] Branch delay slots broken.

2010 Dec 14

[LLVMdev] Branch delay slots broken.

On 12/14/2010 04:28 PM, Wesley Peck wrote: > On Dec 14, 2010, at 3:46 PM, Richard Pennington wrote: >> Notice that the label $BB0_1 is missing. If I disable filling in the >> branch delay slots, I get: > > Is this with the latest SVN HEAD version of LLVM or some other version? The delay slot filler and many other things have been updated for the Microblaze backend. In

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

Does it make a difference if you give the "-target" option to clang? $ clang -target mips-linux-gnu ch8_3.cpp -o ch8_3.bc -emit-llvm -c The .s file generated this way looks quite different from the one in your email. On Tue, Feb 19, 2013 at 5:06 PM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I didn't have Mips board. I compile as the commands and check the asm >

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 20

[LLVMdev] Is va_arg correct on Mips backend?

I didn't have Mips board. I compile as the commands and check the asm output as below. 1. Question: The distance of caller arg[4] and arg[5] is 4 bytes. But the the callee get every arg[] by 8 bytes offset (arg_ptr1+8 or arg_ptr2+8). I assume the #BB#4 and #BB#5 are the arg_ptr which is the pointer to access the stack arguments. 2. Question: Stack memory 28($sp) has no initial value. If

[LLVMdev] Branch delay slots broken.

2010 Dec 15

[LLVMdev] Branch delay slots broken.

On 12/14/2010 04:32 PM, Richard Pennington wrote: > On 12/14/2010 04:28 PM, Wesley Peck wrote: >> On Dec 14, 2010, at 3:46 PM, Richard Pennington wrote: >>> Notice that the label $BB0_1 is missing. If I disable filling in the >>> branch delay slots, I get: >> >> Is this with the latest SVN HEAD version of LLVM or some other version? The delay slot filler and

How to remove memcpy

2016 Oct 15

How to remove memcpy

Hi, I am hoping that someone can help me figure out how to prevent the insertion of "memcpy" from the assembly source. My target is an instruction set simulator that doesn't support this. Thank you for your valuable time. Wolf *Here are my compile commands:* $ clang -emit-llvm -fno-builtin -o3 --target=mips -S matrix_float.c -o vl_matrix_float.ll $ llc vl_matrix_float.ll *IR

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

2010 Feb 09

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

On 09/02/2010, at 3:57 AM, Chris Lattner wrote: > On Feb 8, 2010, at 12:37 AM, Nathan Keynes wrote: >> Firstly, the BNE/BA pair should be reduced to a BE (I assume this is the responsibility of AnalyzeBranch and friends that you mention). > > Right. Implementing AnalyzeBranch will allow a bunch of block layout and branch optimizations to happen. > >> However I still

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

Which part of the generated code do you think is not correct? Could you be more specific? I compiled this program with clang and ran it on a mips board. It returns the expected result (21). On Tue, Feb 19, 2013 at 4:15 AM, Jonathan <gamma_chen at yahoo.com.tw> wrote: > I check the Mips backend for the following C code fragment compile result. > It seems not correct. Is it my

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

2014 Oct 24

[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs

Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

[LLVMdev] [Q] x86 peephole deficiency

Hi all, I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) and now I am running into a deficiency of the x86 peephole optimizer (or jump-threader?). Here is what I get: andl $3, %edi je .LBB0_4 # BB#2: # %nz # in Loop: Header=BB0_1 Depth=1 cmpl $2, %edi

[LLVMdev] Is va_arg correct on Mips backend?

2013 Feb 19

[LLVMdev] Is va_arg correct on Mips backend?

I check the Mips backend for the following C code fragment compile result. It seems not correct. Is it my misunderstand or it's a bug. //ch8_3.cpp #include <stdarg.h> int sum_i(int amount, ...) { int i = 0; int val = 0; int sum = 0; va_list vl; va_start(vl, amount); for (i = 0; i < amount; i++) { val = va_arg(vl, int); sum += val; } va_end(vl);

[LLVMdev] [Q] x86 peephole deficiency

2010 Oct 07

[LLVMdev] [Q] x86 peephole deficiency

On Oct 6, 2010, at 6:16 PM, Gabor Greif wrote: > Hi all, > > I am slowly working on a SwitchInst optimizer (http://llvm.org/PR8125) > and now I am running into a deficiency of the x86 > peephole optimizer (or jump-threader?). Here is what I get: > > > andl $3, %edi > je .LBB0_4 > # BB#2: # %nz >

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

2010 Feb 08

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

On Feb 8, 2010, at 12:37 AM, Nathan Keynes wrote: > Firstly, the BNE/BA pair should be reduced to a BE (I assume this is > the responsibility of AnalyzeBranch and friends that you mention). Right. Implementing AnalyzeBranch will allow a bunch of block layout and branch optimizations to happen. > However I still wouldn't have expected that to result in the label > being

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

2010 Feb 08

[LLVMdev] How to check for "SPARC code generation" in MachineBasicBlock.cpp?

On 11/12/2009, at 10:43 AM, Anton Korobeynikov wrote: > Hi, Chris > >> That is target independent code, so you should not put sparc specific changes there. It sounds like one of the sparc-specific target hooks is wrong. > Since sparc does not provide any hooks for operation of branches (e.g. > AnalyzeBranch and friends) it might be possible that generic codegen > code is

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 25

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

For the following code fragment, ; <label>:27 ; preds = %27, %entry %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 %29 = icmp slt i32 %28, 0 br i1 %29, label %27, label %loop.exit loop.exit: ; preds = %27 llc will generate following MIPS code, $BB0_1: lui $3, 32800 ori $3, $3, 1032 lw

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

2012 Apr 29

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote: > For the following code fragment, > > ; <label>:27 ; preds = %27, %entry > %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8 > %29 = icmp slt i32 %28, 0 > br i1 %29, label %27, label %loop.exit > > loop.exit: ; preds = %27

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

Hi, Recently 10% performance regression on an important benchmark showed up after we integrated https://reviews.llvm.org/rL318299. The analysis showed that rL318299 triggered loop rotation on an multi exits loop, and the loop rotation introduced code layout issue. The performance regression is a side-effect of rL318299. I got two testcases a.ll and b.ll attached to illustrate the problem. a.ll

A code layout related side-effect introduced by rL318299

2017 Dec 19

A code layout related side-effect introduced by rL318299

On Mon, Dec 18, 2017 at 5:46 PM Xinliang David Li <davidxl at google.com> wrote: > The introduction of cleanup.cond block in b.ll without loop-rotation > already makes the layout worse than a.ll. > > > Without introducing cleanup.cond block, the layout out is > > entry->while.cond -> while.body->ret > > All the arrows are hot fall through edges which is

similar to: [LLVMdev] Branch delay slots broken.