thr3ads.net - similar to: "[LLVMdev] Missing Optimization Opportunities"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Missing Optimization Opportunities"

[LLVMdev] Unrolling loops into constant-time expressions

2010 Nov 23

[LLVMdev] Unrolling loops into constant-time expressions

Hello, I've come across another example: I'm compiling with clang -S -emit-llvm -std=gnu99 -O3 clang version 2.9 (trunk 118238) Target: x86_64-unknown-linux-gnu Thread model: posix I take the code: int loops(int x) { int ret = 0; for(int i = 0; i < x; i++) { for(int j = 0; j < x; j++) { ret += 1; } } return ret; } and the

[LLVMdev] Labels

2008 Jan 12

[LLVMdev] Labels

I'm attempting to modify a parser generator to emit LLVM code instead of C. So far the experience has been trivial, but I am now running into an error regarding labels that I can't seem to solve. Situation 1: A label is used immediately after a void function call (l6 in this case): <snip> %tmp26 = load i32* @yybegin, align 4 %tmp27 = load i32* @yyend, align 4 call void

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

2015 Jun 11

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[+Arnold] > On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > [+CC Andy] > >> Can anyone familiar with ScalarRevolution tell me whether this is an >> expected behavior or a bug? > > Assuming you're talking about 2*k, this is a bug. ScalarEvolution > should be able to prove that {0,+,4} is <nsw> and

[LLVMdev] ARM backend problem ?

2007 Jun 12

[LLVMdev] ARM backend problem ?

Hello, I want to compile a LLVM file into an executable running on ARM platform. I use LLVM 2.0 with the following command lines: llvm-as -f -o test.bc test.ll llc -march=arm -mcpu=arm1136j-s -mattr=+v6 -f -o test.s test.bc arm-linux-gnu-as -mcpu=arm1136j-s test.s With the last command, I obtain the following error: rd and rm should be different in mul The bad instruction is

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

2013 Oct 27

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

The following piece of IR is a fixed point for opt -std-compile-opts/-O3: --- target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind readonly define i32 @get32Bits(i8*

[LLVMdev] ARM backend problem ?

2007 Jun 12

[LLVMdev] ARM backend problem ?

Hi Mikael, You are obtaining warning, not an error, right? The most arm cores, including arm1136, can execute mul with rd = rm. So, you can ignore this warning. Lauro 2007/6/12, Peltier, Mikael <m-peltier at ti.com>: > > > > > Hello, > > > > I want to compile a LLVM file into an executable running on ARM platform. > > I use LLVM 2.0 with the following

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

2015 Jun 10

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

I am testing vectorization on the following test case: float x[1024], y[1024]; void myloop1() { for (long int k = 0; k < 512; k++) { x[2*k] = x[2*k]+y[k]; } } Vectorization failed due to "unsafe dependent memory operation". I traced the LoopAccessAnalysis.cpp and found the reason is the NoWrapFlag for SCEVAddRecExpr is not set and consequently the

[LLVMdev] Hoisting elements of array argument into registers

2010 Nov 06

[LLVMdev] Hoisting elements of array argument into registers

I am seeing the wf loop get optimized just fine with llvm 2.8 (and almost as good with head). I'm running on Mac OS X 10.6. I have an apple supplied llvm-gcc and a self compiled llvm 2.8. When I run $ llvm-gcc -emit-llvm -S M.c $ opt -O2 M.s | llvm-dis I see that: 1. Tail recursion has been eliminated from wf 2. The accesses to sp have been promoted to registers 3. The loop has

[LLVMdev] Hoisting elements of array argument into registers

2010 Nov 07

[LLVMdev] Hoisting elements of array argument into registers

David Peixotto <dmp <at> rice.edu> writes: > I am seeing the wf loop get optimized just fine with llvm 2.8 (and almost as good with head). I rechecked this and am I actually seeing the same results as you. I think I must have made a stupid mistake in my tests before - sorry for the noise. However, I found that we have a phase ordering problem which is preventing us getting as much

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

2015 Jun 11

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

> On Jun 10, 2015, at 6:17 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > I'm not sure if inbounds can be used to prove <nuw>. If an object > %OBJ is allocated at address -1 then "gep inbounds %OBJ 1" is not > poison, but the underlying computation unsigned-overflows. I think that this should yield poison per langref because the signed

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

2013 Oct 28

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

On Oct 27, 2013 2:16 PM, "David Nadlinger" <code at klickverbot.at> wrote: > > The following piece of IR is a fixed point for opt -std-compile-opts/-O3: > > --- > target datalayout = > "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

[LLVMdev] scalar-evolution + indvars fail to get the loop trip count?

2008 Dec 09

[LLVMdev] scalar-evolution + indvars fail to get the loop trip count?

Hi, Seems pass scalar-evolution+indvars fail to get the loop trip count of the following case: int foo(int x, int y, int lam[256], int alp[256]) { int i; int z = y; for (i = 255; i >= 0; i--) { z += x; lam[i] = alp[i]; } return z; } The final optimized ll code is : define i32 @foo(i32 %x, i32 %y, i32* %lam, i32* %alp) nounwind { entry: br label %bb bb:

[LLVMdev] Vector code

2008 May 08

[LLVMdev] Vector code

Hi Matthijs, Yes, I've turned off the link-time optimizations (otherwise it just propagates my constant vectors and immediate prints the result). :-) Here's essentially what I try to generate: void add(float z[4], float x[4], float y[4]) { z[0] = x[0] + y[0]; z[1] = x[1] + y[1]; z[2] = x[2] + y[2]; z[3] = x[3] + y[3]; } And here's part of the output from the online

[LLVMdev] Vector code

2008 May 08

[LLVMdev] Vector code

llvm does not automatically vectorize your scalar code (as least for now). You have to write gcc generic vector code or use vector builtins. Evan On May 8, 2008, at 1:46 PM, Nicolas Capens wrote: > Hi Matthijs, > > Yes, I've turned off the link-time optimizations (otherwise it just > propagates my constant vectors and immediate prints the result). :-) > > Here's

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[LLVMdev] A question about GetElementPtr common subexpression elimination/loop invariant code motion

2007 Jan 29

[LLVMdev] A question about GetElementPtr common subexpression elimination/loop invariant code motion

Hello. I have a problem which is quite basic for array optimization, amd I wonder whether I am missing something, but I could not find the LLVM pass that does it. Consider the following code snippet: int test() { int mat[7][7][7]; int i,j,k,sum=0; for(i=0;i<7;i++){ for(j=0;j<7;j++){ for(k=0;k<7;k++){ sum+=mat[i][j][k]^mat[i][j][k^1]; } } } return

[LLVMdev] llvm-gcc + abi stuff

2008 Jan 24

[LLVMdev] llvm-gcc + abi stuff

<moving this to llvmdev instead of commits> On Jan 22, 2008, at 11:23 PM, Duncan Sands wrote: >> Okay, well we already get many other x86-64 issues wrong already, but >> Evan is chipping away at it. How do you pass an array by value in C? >> Example please, > > I find the x86-64 ABI hard to interpret, but it seems to say that > aggregates are classified

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

2010 Jan 29

[LLVMdev] 64bit MRV problem: { float, float, float} -> { double, float }

Hey Duncan, hey everybody else, I just stumbled upon a problem in the latest llvm-gcc trunk which is related to my previous problem with the 64bit ABI and structs: Given the following code: struct float3 { float x, y, z; }; extern "C" void __attribute__((noinline)) test(float3 a, float3* res) { res->y = a.y; } int main(void) { float3 a; float3 res; test(a,

[LLVMdev] Vector code

2008 May 08

[LLVMdev] Vector code

Hi Nicolas (at least, I suspect your signing of your mail with "Anton" was not intentional :-p), > I assume that's the same as the online demo's "Show LLVM C++ API code" > option (http://llvm.org/demo/)? I've tried that with a structure containing > four floating-point components but it also appears to add them individually > using extract/insert. Maybe

similar to: [LLVMdev] Missing Optimization Opportunities