Hi all, LLVM is built without debug enabled. Also i am not forcing lli to use interpreter mode. so i dont think the reason is not because of debug build or interpreter mode. *step 1: * compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc to llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options *step 2:* bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is passed to opt tool using (-O0/-O3) options *step 3:* optimized bytecode obtained from step 2 (generic_replica.opt.bc, xacc.opt.bc and dacc.opt.c) is combinde to a single bytecode file (monolith.bc) using llvm-ld tool *step 4: * running monolith.bc for 10000 iterations using lli tool and measured the time. I also tried using llvm-gcc for emiting bytecode in step 1 but got almost the same output. As i have my entire setup in office i cant attach my makefile today. i will attach my entire setup tom once i get back to office. Also i will attach the configuration options i used for compiling LLVM. Let me know in case i am wrong anywhere. Thanks & Regards, Prasanth J On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com> wrote:> He is probably using the interpreter on a debug build. > > Evan > > > On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at apple.com> wrote: > > >>> for -O3 results refer attachment. >>> time clang (-O0) >>> llvm-gcc(-O0) gcc(-O0) >>> real 0m10.247s >>> 0m11.324s 0m10.963s >>> user 0m2.644s >>> 0m2.478s 0m2.263s >>> sys 0m5.949s >>> 0m6.000s 0m5.953s >>> >>> llvm-jit >>> i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and then passed >>> it to opt tool and then linked all bytecode files to single bytecode using >>> llvm-ld, i used lli tool to run this single bytecode file and noticed the >>> following output >>> real 6m33.786s >>> user 5m12.612s >>> sys 1m1.205s >>> >>> why is lli taking such a loooong time to execute this particular piece of >>> code.?? >>> >> >> Something's wrong on your machine or something. I did the same (but using >> llvm-gcc for the .ll files). Using a debug build of current ToT I got this: >> >> [ghostwheel:~/Desktop] echristo% time >> ~/builds/build-llvm-64bit/Debug/bin/lli foo.bc.bc >> 0.210u 0.010s 0:00.22 100.0% 0+0k 0+0io 0pf+0w >> >> >> That's a 64-bit build, but you'll notice the time difference. That said >> I'm guessing that there's something missing since it takes no time to >> execute. Step by step directions on what you did might help. >> >> -eric >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/9f8c5d23/attachment.html>
Sorry i really forgot to mention one thing. I downloaded the X86 binaries of llvm+clang and llvm-gcc from llvm download site. i hope that is not a debug build. Prasanth J On Sun, Nov 15, 2009 at 1:22 PM, Prasanth J <j.prasanth.j at gmail.com> wrote:> Hi all, > > LLVM is built without debug enabled. Also i am not forcing lli to use > interpreter mode. so i dont think the reason is not because of debug build > or interpreter mode. > > > *step 1: * > compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc > to llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options > *step 2:* > bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is > passed to opt tool using (-O0/-O3) options > *step 3:* > optimized bytecode obtained from step 2 (generic_replica.opt.bc, > xacc.opt.bc and dacc.opt.c) is combinde to a single bytecode file > (monolith.bc) using llvm-ld tool > *step 4: * > running monolith.bc for 10000 iterations using lli tool and measured the > time. > > I also tried using llvm-gcc for emiting bytecode in step 1 but got almost > the same output. As i have my entire setup in office i cant attach my > makefile today. i will attach my entire setup tom once i get back to office. > Also i will attach the configuration options i used for compiling LLVM. Let > me know in case i am wrong anywhere. > > Thanks & Regards, > Prasanth J > > > > > > > On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com> wrote: > >> He is probably using the interpreter on a debug build. >> >> Evan >> >> >> On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at apple.com> wrote: >> >> >>>> for -O3 results refer attachment. >>>> time clang (-O0) >>>> llvm-gcc(-O0) gcc(-O0) >>>> real 0m10.247s >>>> 0m11.324s 0m10.963s >>>> user 0m2.644s >>>> 0m2.478s 0m2.263s >>>> sys 0m5.949s >>>> 0m6.000s 0m5.953s >>>> >>>> llvm-jit >>>> i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and then passed >>>> it to opt tool and then linked all bytecode files to single bytecode using >>>> llvm-ld, i used lli tool to run this single bytecode file and noticed the >>>> following output >>>> real 6m33.786s >>>> user 5m12.612s >>>> sys 1m1.205s >>>> >>>> why is lli taking such a loooong time to execute this particular piece >>>> of code.?? >>>> >>> >>> Something's wrong on your machine or something. I did the same (but using >>> llvm-gcc for the .ll files). Using a debug build of current ToT I got this: >>> >>> [ghostwheel:~/Desktop] echristo% time >>> ~/builds/build-llvm-64bit/Debug/bin/lli foo.bc.bc >>> 0.210u 0.010s 0:00.22 100.0% 0+0k 0+0io 0pf+0w >>> >>> >>> That's a 64-bit build, but you'll notice the time difference. That said >>> I'm guessing that there's something missing since it takes no time to >>> execute. Step by step directions on what you did might help. >>> >>> -eric >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/a648657b/attachment.html>
Prasanth J <j.prasanth.j at gmail.com> writes:> LLVM is built without debug enabled. Also i am not forcing lli to use > interpreter mode. so i dont think the reason is not because of debug build > or interpreter mode. > > > *step 1: * > compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc to > llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options > *step 2:* > bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is > passed to opt tool using (-O0/-O3) options > *step 3:* > optimized bytecode obtained from step 2 (generic_replica.opt.bc, xacc.opt.bc > and dacc.opt.c) is combinde to a single bytecode file (monolith.bc) using > llvm-ld tool > *step 4: * > running monolith.bc for 10000 iterations using lli tool and measured the > time.So if I understand you correctly, you build executables with llvm-gcc and clang, and ran it 10000 times taking about 10 seconds. Then you generate some .bc files, combine and optimized them, and invoke lli 10000 times with the resulting .bc file. lli needs to generate the native code from the .bc file each time you invoke it, so it is not a fair comparision, unless you are testing lli's native code generation speed. So if your program executes fast (<1 ms) when compiled with llvm-gcc but have a moderately large (a few KB) .bc file, that could explain why lli seems slow. If the .bc file is short then, for some unknown reason, lli may be using the interpreter instead of generating and running native code. Which operative system do you use? How long is the .bc file you pass to lli? What's the output of running your .bc file passing the command line option -stats to lli? Is there any difference if you pass to lli the -force-interpreter option too?> I also tried using llvm-gcc for emiting bytecode in step 1 but got almost > the same output. As i have my entire setup in office i cant attach my > makefile today. i will attach my entire setup tom once i get back to office. > Also i will attach the configuration options i used for compiling LLVM. Let > me know in case i am wrong anywhere.-- Óscar
Garrison Venn
2009-Nov-15 11:42 UTC
[LLVMdev] [cfe-dev] Very slow performance of lli on x86
Granted I'm not up on using bit code files, but I don't believe the debug build affects whether or not the jit is used (non-interpretive mode). Ignoring other debug build effects on the efficiency of the jitted code, it would be interesting if you also could measure the time to jit--don't actually execute the 10000 iteration. I don't believe this would explain the time scale shown, but it should have some effect. To my mind, the proffered time scale also implies interpretive mode which you might be able to force to see if this is the culprit. I'll help test when you supply the build (make files). Garrison On Nov 15, 2009, at 2:55, Prasanth J wrote:> Sorry i really forgot to mention one thing. I downloaded the X86 > binaries of llvm+clang and llvm-gcc from llvm download site. i hope > that is not a debug build. > > Prasanth J > > > > > > On Sun, Nov 15, 2009 at 1:22 PM, Prasanth J <j.prasanth.j at gmail.com> > wrote: > Hi all, > > LLVM is built without debug enabled. Also i am not forcing lli to > use interpreter mode. so i dont think the reason is not because of > debug build or interpreter mode. > > > step 1: > compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with > clang-cc to llvm bytecode files using -emit-llvm-bc and (-O0/-O3) > options > step 2: > bytecode obtained from step 1 (generic_replica.bc, xacc.bc and > dacc.bc) is passed to opt tool using (-O0/-O3) options > step 3: > optimized bytecode obtained from step 2 (generic_replica.opt.bc, > xacc.opt.bc and dacc.opt.c) is combinde to a single bytecode file > (monolith.bc) using llvm-ld tool > step 4: > running monolith.bc for 10000 iterations using lli tool and measured > the time. > > I also tried using llvm-gcc for emiting bytecode in step 1 but got > almost the same output. As i have my entire setup in office i cant > attach my makefile today. i will attach my entire setup tom once i > get back to office. Also i will attach the configuration options i > used for compiling LLVM. Let me know in case i am wrong anywhere. > > Thanks & Regards, > Prasanth J > > > > > > > On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com> > wrote: > He is probably using the interpreter on a debug build. > > Evan > > > On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at apple.com> > wrote: > > > for -O3 results refer attachment. > time clang (- > O0) llvm-gcc(-O0) > gcc(-O0) > real > 0m10.247s > 0m11.324s 0m10.963s > user > 0m2.644s > 0m2.478s 0m2.263s > sys > 0m5.949s > 0m6.000s 0m5.953s > > llvm-jit > i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and then > passed it to opt tool and then linked all bytecode files to single > bytecode using llvm-ld, i used lli tool to run this single bytecode > file and noticed the following output > real 6m33.786s > user 5m12.612s > sys 1m1.205s > > why is lli taking such a loooong time to execute this particular > piece of code.?? > > Something's wrong on your machine or something. I did the same (but > using llvm-gcc for the .ll files). Using a debug build of current > ToT I got this: > > [ghostwheel:~/Desktop] echristo% time ~/builds/build-llvm-64bit/ > Debug/bin/lli foo.bc.bc > 0.210u 0.010s 0:00.22 100.0% 0+0k 0+0io 0pf+0w > > > That's a 64-bit build, but you'll notice the time difference. That > said I'm guessing that there's something missing since it takes no > time to execute. Step by step directions on what you did might help. > > -eric > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/e528d0d1/attachment.html>
On Nov 14, 2009, at 11:52 PM, Prasanth J wrote:> step 4: > running monolith.bc for 10000 iterations using lli tool and measured the time.How are you doing this? -eric
Hi all, I have attached the complete test suite. it has different directories for gcc, llvm-gcc , clang and lli-clang. Source code , makefile and run script (contains number of times the program should execute) for each case are available inside each directory. * FOLLOWING ARE THE STATISTICS WHILE USING LLI FOR SINGLE ITERATION* ===-------------------------------------------------------------------------== ... Statistics Collected ... ===-------------------------------------------------------------------------== 58 dagcombine - Number of dag nodes combined 16384 jit - Number of bytes of global vars initialized 357 jit - Number of bytes of machine code compiled 2 jit - Number of global vars initialized 27 jit - Number of relocations applied 3 jit - Number of slabs of memory allocated by the JIT 105 liveintervals - Number of original intervals 21 loop-reduce - Number of IV uses strength reduced 4 loop-reduce - Number of PHIs inserted 2 loop-reduce - Number of loop terminating conds optimized 1 machine-licm - Number of machine instructions hoisted out of loops 4 phielim - Number of atomic phis lowered 2 regalloc - Number of copies coalesced 27 regalloc - Number of iterations performed 3 regcoalescing - Number of cross class joins performed 44 regcoalescing - Number of identity moves eliminated after coalescing 1 regcoalescing - Number of instructions re-materialized 40 regcoalescing - Number of interval joins performed 2 scalar-evolution - Number of loops with predictable loop counts 4 twoaddrinstr - Number of instructions aggressively commuted 6 twoaddrinstr - Number of instructions commuted to coalesce 3 twoaddrinstr - Number of instructions re-materialized 23 twoaddrinstr - Number of two-address instructions 2 virtregrewriter - Number of copies elided 1 x86-codegen - Number of floating point instructions 84 x86-emitter - Number of machine instructions emitted real 0m0.043s user 0m0.027s sys 0m0.010s *FOLLOWING ARE THE STATISTICS WHILE FORCING LLI TO USE INTERPRETER FOR SINGLE ITERATION* ===-------------------------------------------------------------------------== ... Statistics Collected ... ===-------------------------------------------------------------------------== 147495 interpreter - Number of dynamic instructions executed 17735 jit - Number of bytes of global vars initialized 49 jit - Number of global vars initialized real 0m0.083s user 0m0.078s sys 0m0.003s Even for single iteration the time take for execution is pretty high when compared to gcc, llvm-gcc and clang. What should be the expected behavior while using lli? As per my understanding as lli does runtime optimizations it should be faster than clang and llvm-gcc. am i right? *My machine details are* *Linux localhost.localdomain 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41 EDT 2008 i686 i686 i386 GNU/Linux* *Memory : 1GB DDR2 CPU: Intel Pentium Dual-core @ 2.00 GHz* Please let me know how can i proceed with this test. Thanks and Regards, Prasanth J On Mon, Nov 16, 2009 at 1:06 AM, Eric Christopher <echristo at apple.com>wrote:> > On Nov 14, 2009, at 11:52 PM, Prasanth J wrote: > > > step 4: > > running monolith.bc for 10000 iterations using lli tool and measured the > time. > > How are you doing this? > > -eric-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: generic_asm.tgz Type: application/x-gzip Size: 62726 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.bin>