thr3ads.net - similar to: "[LLVMdev] Unrolling loops into constant-time expressions"

Displaying 20 results from an estimated 800 matches similar to: "[LLVMdev] Unrolling loops into constant-time expressions"

[LLVMdev] Hoisting elements of array argument into registers

2010 Nov 07

[LLVMdev] Hoisting elements of array argument into registers

David Peixotto <dmp <at> rice.edu> writes: > I am seeing the wf loop get optimized just fine with llvm 2.8 (and almost as good with head). I rechecked this and am I actually seeing the same results as you. I think I must have made a stupid mistake in my tests before - sorry for the noise. However, I found that we have a phase ordering problem which is preventing us getting as much

[LLVMdev] Hoisting elements of array argument into registers

2010 Nov 06

[LLVMdev] Hoisting elements of array argument into registers

I am seeing the wf loop get optimized just fine with llvm 2.8 (and almost as good with head). I'm running on Mac OS X 10.6. I have an apple supplied llvm-gcc and a self compiled llvm 2.8. When I run $ llvm-gcc -emit-llvm -S M.c $ opt -O2 M.s | llvm-dis I see that: 1. Tail recursion has been eliminated from wf 2. The accesses to sp have been promoted to registers 3. The loop has

[LLVMdev] Labels

2008 Jan 12

[LLVMdev] Labels

I'm attempting to modify a parser generator to emit LLVM code instead of C. So far the experience has been trivial, but I am now running into an error regarding labels that I can't seem to solve. Situation 1: A label is used immediately after a void function call (l6 in this case): <snip> %tmp26 = load i32* @yybegin, align 4 %tmp27 = load i32* @yyend, align 4 call void

[LLVMdev] ARM backend problem ?

2007 Jun 12

[LLVMdev] ARM backend problem ?

Hello, I want to compile a LLVM file into an executable running on ARM platform. I use LLVM 2.0 with the following command lines: llvm-as -f -o test.bc test.ll llc -march=arm -mcpu=arm1136j-s -mattr=+v6 -f -o test.s test.bc arm-linux-gnu-as -mcpu=arm1136j-s test.s With the last command, I obtain the following error: rd and rm should be different in mul The bad instruction is

[LLVMdev] ARM backend problem ?

2007 Jun 12

[LLVMdev] ARM backend problem ?

Hi Mikael, You are obtaining warning, not an error, right? The most arm cores, including arm1136, can execute mul with rd = rm. So, you can ignore this warning. Lauro 2007/6/12, Peltier, Mikael <m-peltier at ti.com>: > > > > > Hello, > > > > I want to compile a LLVM file into an executable running on ARM platform. > > I use LLVM 2.0 with the following

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

2013 Oct 27

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

The following piece of IR is a fixed point for opt -std-compile-opts/-O3: --- target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind readonly define i32 @get32Bits(i8*

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

2013 Oct 28

[LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?

On Oct 27, 2013 2:16 PM, "David Nadlinger" <code at klickverbot.at> wrote: > > The following piece of IR is a fixed point for opt -std-compile-opts/-O3: > > --- > target datalayout = > "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

[LLVMdev] Query on optimization and tail call.

2008 Jun 11

[LLVMdev] Query on optimization and tail call.

Hi, While playing around on the LLVM, I tried this code: int sum(int n) { if (n == 0) return 0; else return n + sum(n-1); } and this is what "llvm-gcc -O2" gave me: define i32 @sum(i32 %n) nounwind { entry: %tmp215 = icmp eq i32 %n, 0 ; <i1> [#uses=1] br i1 %tmp215, label %bb10, label %tailrecurse.bb10_crit_edge tailrecurse.bb10_crit_edge: ; preds =

[LLVMdev] Missing Optimization Opportunities

2010 Sep 10

[LLVMdev] Missing Optimization Opportunities

Hi, I'm using LLVM 2.7 right now, and I found "opt -std-compile-opts" has missed some opportunities for optimization: define void @spa.main() readonly { entry: %tmp = load i32* @dst-ip ; <i32> [#uses=3] %tmp1 = and i32 %tmp, -16777216 ; <i32> [#uses=1] %tmp2 = icmp eq i32 %tmp1, 167772160 ; <i1> [#uses=2]

[LLVMdev] linux/x86-64 codegen support

2008 Feb 16

[LLVMdev] linux/x86-64 codegen support

libcpp/charset.c:631 turns into: %tmp16 = tail call i64 @strlen( i8* %to ) nounwind readonly ; <i64> [#uses=1] %tmp18 = tail call i64 @strlen( i8* %from ) nounwind readonly ; <i64> [#uses=1] %tmp19 = add i64 %tmp16, 2 ; <i64> [#uses=1] %tmp20 = add i64 %tmp19, %tmp18 ; <i64> [#uses=1] %tmp21 = tail

[LLVMdev] linux/x86-64 codegen support

2008 Feb 16

[LLVMdev] linux/x86-64 codegen support

PR1711 is an x86 codegen problem that is blocking adoption of llvm-gcc by people using linux on x86-64 boxes. Could someone with access to one of these boxes take a look? I'll help try to debug this, but I don't have access to a machine. I bet it's a small tweak required in the x86 backend. Thanks! -Chris

[LLVMdev] Unrolling power sum calculations into constant time expressions

2010 Nov 23

[LLVMdev] Unrolling power sum calculations into constant time expressions

Hello, I noticed that feeding 'clang -O3' with functions like: int sum1(int x) { int ret = 0; for(int i = 0; i < x; i++) ret += i; return ret; } int sum2(int x) { int ret = 0; for(int i = 0; i < x; i++) ret += i*i; return ret; } ... int sum20(int x) { int ret = 0; for(int i = 0; i < x; i++) ret +=

[LLVMdev] A question about GetElementPtr common subexpression elimination/loop invariant code motion

2007 Jan 29

[LLVMdev] A question about GetElementPtr common subexpression elimination/loop invariant code motion

Hello. I have a problem which is quite basic for array optimization, amd I wonder whether I am missing something, but I could not find the LLVM pass that does it. Consider the following code snippet: int test() { int mat[7][7][7]; int i,j,k,sum=0; for(i=0;i<7;i++){ for(j=0;j<7;j++){ for(k=0;k<7;k++){ sum+=mat[i][j][k]^mat[i][j][k^1]; } } } return

[LLVMdev] Unrolling an arithmetic expression inside a loop

2010 Nov 23

[LLVMdev] Unrolling an arithmetic expression inside a loop

Hello, I've been redirected from cfe-dev, as code optimizations in clang are done in llvm layer. I'm investigating how optimized code clang generates, and have come across such an example: I have two procedures: void exec0(const int *X, const int *Y, int *res, const int N) { int t1[N],t2[N],t3[N],t4[N],t5[N],t6[N]; for(int i = 0; i < N; i++) { t1[i] = X[i]+Y[i];

[LLVMdev] linux/x86-64 codegen support

2008 Feb 16

[LLVMdev] linux/x86-64 codegen support

Interestingly, in the .i file there are 2 __builtin_alloca, and EmitBuiltinAlloca is only being called once. Andrew On 2/16/08, Andrew Lenharth <andrewl at lenharth.org> wrote: > libcpp/charset.c:631 turns into: > > %tmp16 = tail call i64 @strlen( i8* %to ) nounwind readonly > ; <i64> [#uses=1] > %tmp18 = tail call i64 @strlen( i8* %from ) nounwind

SelectionDAG::LegalizeTypes is very slow in 3.1 version

2016 Sep 27

SelectionDAG::LegalizeTypes is very slow in 3.1 version

In 3.1, the backend is very slow to legalize types. Following is the code snippet which may be the culprit: %Result.i.i.i97 = alloca i33, align 8 %Result.i.i.i96= alloca i33, align 8 %Result.i.i.i95 = alloca i33, align 8 %Result.i.i.i94 = alloca i33, align 8 %Result.i.i.i93 = alloca i33, align 8 %Result.i.i.i92= alloca i33, align 8 %Result.i.i.i91 = alloca i33, align 8

[LLVMdev] Deleting Instructions after Intrinsic Creation

2008 Mar 04

[LLVMdev] Deleting Instructions after Intrinsic Creation

Hi, I tried creating intrinsics which are to be placeholders for a set of instructions which should not be executed by the backend. I want to retain only intrinsic,phi and terminator instructions in a basic block. I have taken care of the external dependencies of basic block. How do I delete the rest of the instructions? Thank You Aditya P.S:

[LLVMdev] Deleting Instructions after Intrinsic Creation

2008 Mar 04

[LLVMdev] Deleting Instructions after Intrinsic Creation

similar to: [LLVMdev] Unrolling loops into constant-time expressions