The following seemingly identical functions, get compiled to quite different machine code. The first is correctly optimized (the computation of var y is nicely moved into the else branch of the "if" statement), which the second one is not (the full computation of var y is always done). The output was produced using the demo page on llvm's web site (optimization level LTO). Can someone shed some light on this strange behavior? Thanks, Brent ==============================================================int f1(int x) { int y = 5*x*x*x+2*x*x+3*x+1; if (x > 0) return 0; else return y; } int f2(int x) { int y = 5*x*x*x+2*x*x+3*x+1; return x > 0 ? 0 : y; } ===============================================================Output from llvm disassembler targeting LLVM assembly ; ModuleID = '/tmp/webcompile/_11262_0.bc' target datalayout "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" define i32 @f1(i32 %x) nounwind readnone { ; <label>:0 %1 = icmp sgt i32 %x, 0 br i1 %1, label %7, label %2 ; <label>:2 ; preds = %0 %3 = mul i32 %x, 5 %tmp = add i32 %3, 2 %tmp1 = mul i32 %tmp, %x %4 = add i32 %tmp1, 3 %5 = mul i32 %4, %x %6 = add nsw i32 %5, 1 br label %7 ; <label>:7 ; preds = %2, %0 %.0 = phi i32 [ %6, %2 ], [ 0, %0 ] ret i32 %.0 } define i32 @f2(i32 %x) nounwind readnone { %1 = mul i32 %x, 5 %tmp = add i32 %1, 2 %tmp1 = mul i32 %tmp, %x %2 = add i32 %tmp1, 3 %3 = mul i32 %2, %x %4 = add nsw i32 %3, 1 %5 = icmp sgt i32 %x, 0 %6 = select i1 %5, i32 0, i32 %4 ret i32 %6 } ===================================================
On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote:> The following seemingly identical functions, get compiled to quite > different machine code. The first is correctly optimized (the > computation of var y is nicely moved into the else branch of the "if" > statement), which the second one is not (the full computation of var y > is always done). The output was produced using the demo page on > llvm's web site (optimization level LTO). > > Can someone shed some light on this strange behavior?The IR you're seeing is the IR we naturally generate for the given IR, and LLVM doesn't have an optimization to turn your f2 into f1. Which version is better depends on the values passed into the function, which makes it a bit tricky to write an optimization to turn one into the other. -Eli
Hi folks, I'm interested in building a secure and verifiable operating system. One good way to do it is to build the OS with high level language, such as Java or C# since verification is significantly easier. I have some experience on analyzing LLVM byte codes, so I also plan to compile everything into LLVM byte codes to analyze and verify my OS. However, I also plan to write my own runtime, since I don't want to rely on GC (I'm considering something like region-based memory management). My question is what's the status of high level language support in LLVM? To my best knowledge, (1) there's a java class to llvm byte code translator in the SVN. (2) the VMKit project claims that it has a good JVM. I appreciate if you could give me some more information. Comments and suggestions are also appreciated. Thanks! ~Haohui
On Tue, Dec 13, 2011 at 5:55 PM, Haohui Mai <haohui.mai at gmail.com> wrote:> Hi folks, > > I'm interested in building a secure and verifiable operating system. One good way to do it is to build the OS with high level language, such as Java or C# since verification is significantly easier. > > I have some experience on analyzing LLVM byte codes, so I also plan to compile everything into LLVM byte codes to analyze and verify my OS. > > However, I also plan to write my own runtime, since I don't want to rely on GC (I'm considering something like region-based memory management). > > My question is what's the status of high level language support in LLVM?The question isn't really clear. What features are you looking for? -Eli
I don't understand your point. Which version is better does NOT depend on what inputs are passed to the function. The compiled code for (as per llvm) f1 will always take less time to execute than f2. for x > 0 => T(f1) < T(f2) for x <= 0 => T(f1) = T(f2) where T() is the time to execute the given function. So always T(f1) <= T(f2). I would call this a missed optimization opportunity. I think it warrants a bug report. If I do the same experiment with gcc I get identical code for the two functions: =============================================_f1: pushl %ebp xorl %eax, %eax movl %esp, %ebp movl 8(%ebp), %edx testl %edx, %edx jle L5 popl %ebp ret .p2align 4,,7L5: movl %edx, %ecx imull %edx, %ecx popl %ebp leal 3(%ecx,%ecx,4), %eax imull %edx, %eax leal 1(%eax,%ecx,2), %eax ret .p2align 4,,15 _f2: pushl %ebp xorl %eax, %eax movl %esp, %ebp movl 8(%ebp), %edx testl %edx, %edx jle L9 popl %ebp ret .p2align 4,,7L9: movl %edx, %ecx imull %edx, %ecx popl %ebp leal 3(%ecx,%ecx,4), %eax imull %edx, %eax leal 1(%eax,%ecx,2), %eax ret============================================= Brent On Wed, Dec 14, 2011 at 9:58 AM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote: >> The following seemingly identical functions, get compiled to quite >> different machine code. The first is correctly optimized (the >> computation of var y is nicely moved into the else branch of the "if" >> statement), which the second one is not (the full computation of var y >> is always done). The output was produced using the demo page on >> llvm's web site (optimization level LTO). >> >> Can someone shed some light on this strange behavior? > > The IR you're seeing is the IR we naturally generate for the given IR, > and LLVM doesn't have an optimization to turn your f2 into f1. Which > version is better depends on the values passed into the function, > which makes it a bit tricky to write an optimization to turn one into > the other. > > -Eli
Seemingly Similar Threads
- [LLVMdev] Failure to optimize ? operator
- [LLVMdev] Failure to optimize ? operator
- [LLVMdev] operator overloading fails while debugging with gdb for i386
- [LLVMdev] operator overloading fails while debugging with gdb for i386
- Suboptimal code generated by clang+llc in quite a common scenario (?)