similar to: [LLVMdev] Failure to optimize ? operator

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Failure to optimize ? operator"

2011 Dec 14
2
[LLVMdev] Failure to optimize ? operator
I don't understand your point. Which version is better does NOT depend on what inputs are passed to the function. The compiled code for (as per llvm) f1 will always take less time to execute than f2. for x > 0 => T(f1) < T(f2) for x <= 0 => T(f1) = T(f2) where T() is the time to execute the given function. So always T(f1) <= T(f2). I would call this a missed
2011 Dec 14
0
[LLVMdev] Failure to optimize ? operator
On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote: > The following seemingly identical functions, get compiled to quite > different machine code.  The first is correctly optimized (the > computation of var y is nicely moved into the else branch of the "if" > statement), which the second one is not (the full computation of var y > is
2012 Nov 29
2
[LLVMdev] operator overloading fails while debugging with gdb for i386
For the given test: class A1 { int x; int y; public: A1(int a, int b) { x=a; y=b; } A1 operator+(const A1&); }; A1 A1::operator+(const A1& second) { A1 sum(0,0); sum.x = x + second.x; sum.y = y + second.y; return (sum); } int main (void) { A1 one(2,3); A1 two(4,5); return 0; } when the exectable of this code is debugged in gdb for i386, we dont get the
2012 Dec 01
0
[LLVMdev] operator overloading fails while debugging with gdb for i386
Problem seems not only with operator overloading, It occurs with struct value returning also. gdb while debugging expects the return value in eax, gcc does returns in eax, But Clang returns in edx(it can be checked in gdb by printing the contents of edx). Code(sample code) struct A1 { int x; int y; }; A1 sum(const A1 one, const A1 two) { A1 plus = {0,0}; plus.x = one.x + two.x; plus.y
2019 Aug 08
2
Suboptimal code generated by clang+llc in quite a common scenario (?)
I found a something that I quite not understand when compiling a common piece of code using the -Os flags. I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code: char pp[3]; char *scscx = pp; int tst( char i, char j, char k ) { scscx[0] = i; scscx[1] = j; scscx[2] = k; return 0; } The above gets
2012 Dec 01
2
[LLVMdev] operator overloading fails while debugging with gdb for i386
Hi, Structures are passed by pointer, so the return value is not actually in eax. That code gets transformed into something like: void sum(A1 *out, const A1 one, const A1 two) { out->x = one.x + two.x out->y = one.y + two.y } So actually the function ends up returning void and operating on a hidden parameter, so %eax is dead at the end of the function and should not be being relied
2011 Mar 24
2
[LLVMdev] GCC vs. LLVM difference on simple code example
Hi, I have a question on why gcc and llvm-gcc compile the following simple code snippet differently: extern int a; extern int *b; void foo() { int i; for (i = 1; i < 100; ++i) a += b[i]; } gcc compiles this function hoisting the load of the global variable "b" outside of the loop, while llvm-gcc keeps it inside the loop. This results in slower code on the part of
2005 Feb 22
5
[LLVMdev] Area for improvement
I noticed that fourinarow is one of the programs in which LLVM is much slower than GCC, so I decided to take a look and see why that is so. The program has many loops that look like this: #define ROWS 6 #define COLS 7 void init_board(char b[COLS][ROWS+1]) { int i,j; for (i=0;i<COLS;i++) for (j=0;j<ROWS;j++) b[i][j]='.';
2010 Jan 04
0
[LLVMdev] Tail Call Optimisation
On Monday 04 January 2010 05:16:40 Jeffrey Yasskin wrote: > On Sun, Jan 3, 2010 at 10:50 PM, Jon Harrop <jon at ffconsultancy.com> wrote: > > LLVM's TCO already handles mutual recursion. > > Only for fastcc functions Yes. > compiled with -tailcallopt, right? If you use the compiler, yes. > http://llvm.org/docs/CodeGenerator.html#tailcallopt > > I believe
2005 Feb 22
0
[LLVMdev] Area for improvement
When I increased COLS to the point where the loop could no longer be unrolled, the selection dag code generator generated effectively the same code as the default X86 code generator. Lots of redundant imul/movl/addl sequences. It can't clean it up either. Only unrolling all nested loops permits it to be optimized away, regardless of code generator. Jeff Cohen wrote: > I noticed
2010 Jan 04
2
[LLVMdev] Tail Call Optimisation
On Sun, Jan 3, 2010 at 10:50 PM, Jon Harrop <jon at ffconsultancy.com> wrote: > On Monday 04 January 2010 03:33:06 Simon Harris wrote: >> On 04/01/2010, at 3:01 PM, Jon Harrop wrote: >> > I am certainly interested in tail calls because my HLVM project relies >> > upon LLVM's tail call elimination. However, I do not understand what tail >> > calls LLVM
2002 Nov 07
5
From RISKS: secret scrubbing code removed by optimizers
This showed up in RISKS and no one has mentioned it here yet, so.. OpenSSH contains lots of code like: char *password = read_passphrase(prompt, 0); [do stuff] memset(password, 0, strlen(password));
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =
2011 Jul 06
0
[LLVMdev] code generation removes duplicated instructions
On 6 July 2011 02:31, D S Khudia <daya.khudia at gmail.com> wrote: >   %0 = load i32* %i, align 4 >   %HV14_ = getelementptr inbounds [100 x i32]* %a, i32 0, i32 %0 >   %1 = getelementptr inbounds [100 x i32]* %a, i32 0, i32 %0 >   %HVCmp7 = icmp ne i32* %1, %HV14_ >   br i1 %HVCmp7, label %relExit, label %bb.split > > So that HV14_ is a new instruction and I am
2005 Feb 22
0
[LLVMdev] Area for improvement
On Mon, 21 Feb 2005, Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much slower > than GCC, so I decided to take a look and see why that is so. The program > has many loops that look like this: > > #define ROWS 6 > #define COLS 7 > > void init_board(char b[COLS][ROWS+1]) > { > int i,j; > > for
2009 Aug 18
0
[LLVMdev] Build issues on Solaris
Hello, Nathan > or if it should be a configure test, which might be safer. Are there > any x86 platforms (other than apple) that don't need PLT-indirect calls? Yes, mingw. However just tweaking the define is not enough - we're not loading address of GOT into ebx before the call (on 32 bit ABIs) thus the call will be to nowhere. -- With best regards, Anton Korobeynikov Faculty of
2005 Feb 22
2
[LLVMdev] Area for improvement
Sorry, I thought I was running selection dag isel but I screwed up when trying out the really big array. You're right, it does clean it up except for the multiplication. So LoopStrengthReduce is not ready for prime time and doesn't actually get used? I might consider whipping it into shape. Does it still have to handle getelementptr in its full generality? Chris Lattner wrote:
2011 Jul 06
2
[LLVMdev] code generation removes duplicated instructions
Hello, I am duplicating few instructions in a basic block and splitting it. The following is an example. bb: ; preds = %bb1 %0 = load i32* %i, align 4 %1 = getelementptr inbounds [100 x i32]* %a, i32 0, i32 %0 store i32 0, i32* %1, align 4 %2 = load i32* %i, align 4 %3 = getelementptr inbounds [100 x i32]* %last_added, i32 0, i32 %2 store
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2009 Aug 11
6
[LLVMdev] Build issues on Solaris
Hi all, I've encountered a couple of minor build issues on Solaris that have crept in since 2.5, fixes below: 1. In lib/Target/X86/X86JITInfo.cpp, there is: // Check if building with -fPIC #if defined(__PIC__) && __PIC__ && defined(__linux__) #define ASMCALLSUFFIX "@PLT" #else #define ASMCALLSUFFIX #endif Which causes a link failure due to the non-PLT