thr3ads.net - llvm dev - [LLVMdev] Very slow performance of lli on x86 [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Prasanth J

2009-Nov-15 07:52 UTC

[LLVMdev] Very slow performance of lli on x86

Hi all,

LLVM is built without debug enabled. Also i am not forcing lli to use
interpreter mode. so i dont think the reason is not because of debug build
or interpreter mode.

*step 1: *
compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc to
llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options
*step 2:*
bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is
passed to opt tool using (-O0/-O3) options
*step 3:*
optimized bytecode obtained from step 2 (generic_replica.opt.bc, xacc.opt.bc
and dacc.opt.c) is combinde to a single bytecode file (monolith.bc) using
llvm-ld tool
*step 4: *
running monolith.bc for 10000 iterations using lli tool and measured the
time.

I also tried using llvm-gcc for emiting bytecode in step 1 but got almost
the same output. As i have my entire setup in office i cant attach my
makefile today. i will attach my entire setup tom once i get back to office.
Also i will attach the configuration options i used for compiling LLVM. Let
me know in case i am wrong anywhere.

Thanks & Regards,
Prasanth J

On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com>
wrote:
> He is probably using the interpreter on a debug build.
>
> Evan
>
>
> On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at apple.com>
wrote:
>
>
>>> for -O3 results refer attachment.
>>> time                      clang (-O0)
>>> llvm-gcc(-O0)                   gcc(-O0)
>>> real                      0m10.247s
>>> 0m11.324s                         0m10.963s
>>> user                     0m2.644s
>>> 0m2.478s                          0m2.263s
>>> sys                      0m5.949s
>>> 0m6.000s                          0m5.953s
>>>
>>> llvm-jit
>>> i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and then
passed
>>> it to opt tool and then linked all bytecode files to single
bytecode using
>>> llvm-ld, i used lli tool to run this single bytecode file and
noticed the
>>> following output
>>> real      6m33.786s
>>> user      5m12.612s
>>> sys       1m1.205s
>>>
>>> why is lli taking such a loooong time to execute this particular
piece of
>>> code.??
>>>
>>
>> Something's wrong on your machine or something. I did the same (but
using
>> llvm-gcc for the .ll files).  Using a debug build of current ToT I got
this:
>>
>> [ghostwheel:~/Desktop] echristo% time
>> ~/builds/build-llvm-64bit/Debug/bin/lli foo.bc.bc
>> 0.210u 0.010s 0:00.22 100.0%    0+0k 0+0io 0pf+0w
>>
>>
>> That's a 64-bit build, but you'll notice the time difference.
That said
>> I'm guessing that there's something missing since it takes no
time to
>> execute. Step by step directions on what you did might help.
>>
>> -eric
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/9f8c5d23/attachment.html>

Prasanth J

2009-Nov-15 07:55 UTC

head link

[LLVMdev] Very slow performance of lli on x86

Sorry i really forgot to mention one thing. I downloaded the X86 binaries of
llvm+clang and llvm-gcc from llvm download site. i hope that is not a debug
build.

Prasanth J





On Sun, Nov 15, 2009 at 1:22 PM, Prasanth J <j.prasanth.j at gmail.com>
wrote:
> Hi all,
>
> LLVM is built without debug enabled. Also i am not forcing lli to use
> interpreter mode. so i dont think the reason is not because of debug build
> or interpreter mode.
>
>
> *step 1: *
> compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc
> to llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options
> *step 2:*
> bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is
> passed to opt tool using (-O0/-O3) options
> *step 3:*
> optimized bytecode obtained from step 2 (generic_replica.opt.bc,
> xacc.opt.bc and dacc.opt.c) is combinde to a single bytecode file
> (monolith.bc) using llvm-ld tool
> *step 4: *
> running monolith.bc for 10000 iterations using lli tool and measured the
> time.
>
> I also tried using llvm-gcc for emiting bytecode in step 1 but got almost
> the same output. As i have my entire setup in office i cant attach my
> makefile today. i will attach my entire setup tom once i get back to
office.
> Also i will attach the configuration options i used for compiling LLVM. Let
> me know in case i am wrong anywhere.
>
> Thanks & Regards,
> Prasanth J
>
>
>
>
>
>
> On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com>
wrote:
>
>> He is probably using the interpreter on a debug build.
>>
>> Evan
>>
>>
>> On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at
apple.com> wrote:
>>
>>
>>>> for -O3 results refer attachment.
>>>> time                      clang (-O0)
>>>> llvm-gcc(-O0)                   gcc(-O0)
>>>> real                      0m10.247s
>>>> 0m11.324s                         0m10.963s
>>>> user                     0m2.644s
>>>> 0m2.478s                          0m2.263s
>>>> sys                      0m5.949s
>>>> 0m6.000s                          0m5.953s
>>>>
>>>> llvm-jit
>>>> i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and
then passed
>>>> it to opt tool and then linked all bytecode files to single
bytecode using
>>>> llvm-ld, i used lli tool to run this single bytecode file and
noticed the
>>>> following output
>>>> real      6m33.786s
>>>> user      5m12.612s
>>>> sys       1m1.205s
>>>>
>>>> why is lli taking such a loooong time to execute this
particular piece
>>>> of code.??
>>>>
>>>
>>> Something's wrong on your machine or something. I did the same
(but using
>>> llvm-gcc for the .ll files).  Using a debug build of current ToT I
got this:
>>>
>>> [ghostwheel:~/Desktop] echristo% time
>>> ~/builds/build-llvm-64bit/Debug/bin/lli foo.bc.bc
>>> 0.210u 0.010s 0:00.22 100.0%    0+0k 0+0io 0pf+0w
>>>
>>>
>>> That's a 64-bit build, but you'll notice the time
difference. That said
>>> I'm guessing that there's something missing since it takes
no time to
>>> execute. Step by step directions on what you did might help.
>>>
>>> -eric
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/a648657b/attachment.html>

Óscar Fuentes

2009-Nov-15 10:05 UTC

head link

[LLVMdev] Very slow performance of lli on x86

Prasanth J <j.prasanth.j at gmail.com>
writes:
> LLVM is built without debug enabled. Also i am not forcing lli to use
> interpreter mode. so i dont think the reason is not because of debug build
> or interpreter mode.
>
>
> *step 1: *
> compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with clang-cc
to
> llvm bytecode files using -emit-llvm-bc and (-O0/-O3) options
> *step 2:*
> bytecode obtained from step 1 (generic_replica.bc, xacc.bc and dacc.bc) is
> passed to opt tool using (-O0/-O3) options
> *step 3:*
> optimized bytecode obtained from step 2 (generic_replica.opt.bc,
xacc.opt.bc
> and dacc.opt.c) is combinde to a single bytecode file (monolith.bc) using
> llvm-ld tool
> *step 4: *
> running monolith.bc for 10000 iterations using lli tool and measured the
> time.
So if I understand you correctly, you build executables with
llvm-gcc and clang, and ran it 10000 times taking about 10 seconds. Then
you generate some .bc files, combine and optimized them, and invoke
lli 10000 times with the resulting .bc file.

lli needs to generate the native code from the .bc file each time you
invoke it, so it is not a fair comparision, unless you are testing lli's
native code generation speed.

So if your program executes fast (<1 ms) when compiled with llvm-gcc
but have a moderately large (a few KB) .bc file, that could explain why
lli seems slow.

If the .bc file is short then, for some unknown reason, lli may be using
the interpreter instead of generating and running native code.

Which operative system do you use? How long is the .bc file you pass to
lli? What's the output of running your .bc file passing the command line
option -stats to lli? Is there any difference if you pass to lli the
-force-interpreter option too?
> I also tried using llvm-gcc for emiting bytecode in step 1 but got almost
> the same output. As i have my entire setup in office i cant attach my
> makefile today. i will attach my entire setup tom once i get back to
office.
> Also i will attach the configuration options i used for compiling LLVM. Let
> me know in case i am wrong anywhere.
-- 
Óscar

Garrison Venn

2009-Nov-15 11:42 UTC

head link

[LLVMdev] [cfe-dev] Very slow performance of lli on x86

Granted I'm not up on using bit code files, but I don't believe the  
debug build affects whether or not the jit is used (non-interpretive  
mode). Ignoring other debug build effects on the efficiency of the  
jitted code, it would be interesting if you also could measure the  
time to jit--don't actually execute the 10000 iteration. I don't  
believe this would explain the time scale shown, but it should have  
some effect. To my mind, the proffered time scale also implies  
interpretive mode which you might be able to force to see if this is  
the culprit. I'll help test when you supply the build (make files).

Garrison

On Nov 15, 2009, at 2:55, Prasanth J wrote:
> Sorry i really forgot to mention one thing. I downloaded the X86  
> binaries of llvm+clang and llvm-gcc from llvm download site. i hope  
> that is not a debug build.
>
> Prasanth J
>
>
>
>
>
> On Sun, Nov 15, 2009 at 1:22 PM, Prasanth J <j.prasanth.j at
gmail.com>
> wrote:
> Hi all,
>
> LLVM is built without debug enabled. Also i am not forcing lli to  
> use interpreter mode. so i dont think the reason is not because of  
> debug build or interpreter mode.
>
>
> step 1:
> compiled the 3 files (generic_replica.c ,xacc.c and dacc.c) with  
> clang-cc to llvm bytecode files using -emit-llvm-bc and (-O0/-O3)  
> options
> step 2:
> bytecode obtained from step 1 (generic_replica.bc, xacc.bc and  
> dacc.bc) is passed to opt tool using (-O0/-O3) options
> step 3:
> optimized bytecode obtained from step 2 (generic_replica.opt.bc,  
> xacc.opt.bc and dacc.opt.c) is combinde to a single bytecode file  
> (monolith.bc) using llvm-ld tool
> step 4:
> running monolith.bc for 10000 iterations using lli tool and measured  
> the time.
>
> I also tried using llvm-gcc for emiting bytecode in step 1 but got  
> almost the same output. As i have my entire setup in office i cant  
> attach my makefile today. i will attach my entire setup tom once i  
> get back to office. Also i will attach the configuration options i  
> used for compiling LLVM. Let me know in case i am wrong anywhere.
>
> Thanks & Regards,
> Prasanth J
>
>
>
>
>
>
> On Sun, Nov 15, 2009 at 3:40 AM, Evan Cheng <evan.cheng at apple.com>
> wrote:
> He is probably using the interpreter on a debug build.
>
> Evan
>
>
> On Nov 14, 2009, at 1:40 PM, Eric Christopher <echristo at apple.com>
> wrote:
>
>
> for -O3 results refer attachment.
> time                      clang (- 
> O0)                                 llvm-gcc(-O0)                    
> gcc(-O0)
> real                       
> 0m10.247s                                    
> 0m11.324s                         0m10.963s
> user                      
> 0m2.644s                                      
> 0m2.478s                          0m2.263s
> sys                       
> 0m5.949s                                      
> 0m6.000s                          0m5.953s
>
> llvm-jit
> i used clang-cc -O0 -emit-llvm-bc to emit llvm bytecode and then  
> passed it to opt tool and then linked all bytecode files to single  
> bytecode using llvm-ld, i used lli tool to run this single bytecode  
> file and noticed the following output
> real      6m33.786s
> user      5m12.612s
> sys       1m1.205s
>
> why is lli taking such a loooong time to execute this particular  
> piece of code.??
>
> Something's wrong on your machine or something. I did the same (but  
> using llvm-gcc for the .ll files).  Using a debug build of current  
> ToT I got this:
>
> [ghostwheel:~/Desktop] echristo% time ~/builds/build-llvm-64bit/ 
> Debug/bin/lli foo.bc.bc
> 0.210u 0.010s 0:00.22 100.0%    0+0k 0+0io 0pf+0w
>
>
> That's a 64-bit build, but you'll notice the time difference. That
> said I'm guessing that there's something missing since it takes no
> time to execute. Step by step directions on what you did might help.
>
> -eric
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091115/e528d0d1/attachment.html>

Eric Christopher

2009-Nov-15 19:36 UTC

head link

[LLVMdev] Very slow performance of lli on x86

On Nov 14, 2009, at 11:52 PM, Prasanth J wrote:
> step 4: 
> running monolith.bc for 10000 iterations using lli tool and measured the
time.
How are you doing this?

-eric

Prasanth J

2009-Nov-16 06:44 UTC

head link

[LLVMdev] Very slow performance of lli on x86

Hi all,

I have attached the complete test suite. it has different directories for
gcc, llvm-gcc , clang and lli-clang. Source code , makefile and run script
(contains number of times the program should execute) for each case are
available inside each directory.

*
FOLLOWING ARE THE STATISTICS WHILE USING LLI FOR SINGLE ITERATION*

===-------------------------------------------------------------------------==  
... Statistics Collected ...
===-------------------------------------------------------------------------==
   58 dagcombine       - Number of dag nodes combined
16384 jit              - Number of bytes of global vars initialized
  357 jit              - Number of bytes of machine code compiled
    2 jit              - Number of global vars initialized
   27 jit              - Number of relocations applied
    3 jit              - Number of slabs of memory allocated by the JIT
  105 liveintervals    - Number of original intervals
   21 loop-reduce      - Number of IV uses strength reduced
    4 loop-reduce      - Number of PHIs inserted
    2 loop-reduce      - Number of loop terminating conds optimized
    1 machine-licm     - Number of machine instructions hoisted out of loops
    4 phielim          - Number of atomic phis lowered
    2 regalloc         - Number of copies coalesced
   27 regalloc         - Number of iterations performed
    3 regcoalescing    - Number of cross class joins performed
   44 regcoalescing    - Number of identity moves eliminated after
coalescing
    1 regcoalescing    - Number of instructions re-materialized
   40 regcoalescing    - Number of interval joins performed
    2 scalar-evolution - Number of loops with predictable loop counts
    4 twoaddrinstr     - Number of instructions aggressively commuted
    6 twoaddrinstr     - Number of instructions commuted to coalesce
    3 twoaddrinstr     - Number of instructions re-materialized
   23 twoaddrinstr     - Number of two-address instructions
    2 virtregrewriter  - Number of copies elided
    1 x86-codegen      - Number of floating point instructions
   84 x86-emitter      - Number of machine instructions emitted


real    0m0.043s
user    0m0.027s
sys    0m0.010s


*FOLLOWING ARE THE STATISTICS WHILE FORCING LLI TO USE INTERPRETER FOR
SINGLE ITERATION*

===-------------------------------------------------------------------------==  
... Statistics Collected ...
===-------------------------------------------------------------------------==
147495 interpreter - Number of dynamic instructions executed
 17735 jit         - Number of bytes of global vars initialized
    49 jit         - Number of global vars initialized


real    0m0.083s
user    0m0.078s
sys    0m0.003s


Even for single iteration the time take for execution is pretty high when
compared to gcc, llvm-gcc and clang.
What should be the expected behavior while using lli? As per my
understanding as lli does runtime optimizations it should be faster than
clang and llvm-gcc. am i right?

*My machine details are*
*Linux localhost.localdomain 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41
EDT 2008 i686 i686 i386 GNU/Linux*
*Memory : 1GB DDR2
CPU: Intel Pentium Dual-core @ 2.00 GHz*


Please let me know how can i proceed with this test.



Thanks and Regards,
Prasanth J




On Mon, Nov 16, 2009 at 1:06 AM, Eric Christopher <echristo at
apple.com>wrote:
>
> On Nov 14, 2009, at 11:52 PM, Prasanth J wrote:
>
> > step 4:
> > running monolith.bc for 10000 iterations using lli tool and measured
the
> time.
>
> How are you doing this?
>
> -eric-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: generic_asm.tgz
Type: application/x-gzip
Size: 62726 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.bin>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Nov 2009 - [LLVMdev] Very slow performance of lli on x86

[LLVMdev] Very slow performance of lli on x86

[LLVMdev] Very slow performance of lli on x86

[LLVMdev] Very slow performance of lli on x86

[LLVMdev] [cfe-dev] Very slow performance of lli on x86

[LLVMdev] Very slow performance of lli on x86

[LLVMdev] Very slow performance of lli on x86

Reasonably Related Threads