thr3ads.net - llvm dev - [llvm-dev] Where is opt spending its time? [Mar 2016]

If this information is useful, please help other people find it:
Share via:

David Jones via llvm-dev

2016-Mar-09 15:52 UTC

[llvm-dev] Where is opt spending its time?

I am trying to improve my application's compile-time performance.

On a given workload, I take 68 seconds to compile some code. If I disable
the LLVM code generation (i.e. I will generate IR instructions, but skip
the LLVM optimization and instruction selection steps) then my compile time
drops to 3 seconds.  If I write out the LLVM IR (just to prove that I am
generating it) then my compile time is 4 seconds. We're spending >90% of
the time in LLVM code generation.

To try to determine if there's anything I can do, I ran:

 time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll
-time-passes

and I get:

===-------------------------------------------------------------------------==  
... Pass execution timing report ...
===-------------------------------------------------------------------------== 
Total Execution Time: 19.1382 seconds (19.1587 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---
--- Name ---
   4.4755 ( 23.5%)   0.0000 (  0.0%)   4.4755 ( 23.4%)   4.4806 ( 23.4%)
Dead Store Elimination
   3.6255 ( 19.0%)   0.0000 (  0.0%)   3.6255 ( 18.9%)   3.6282 ( 18.9%)
Combine redundant instructions
   1.2138 (  6.4%)   0.0040 (  5.0%)   1.2178 (  6.4%)   1.2185 (  6.4%)
SROA
...
real    1m7.783s
user    1m7.548s
sys     0m0.183s

So: opt reports that it took 19 seconds, but overall, the run took 88
seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The
system is not running anything else at the time.

What activity accounts for the unaccounted-for time?

For my application, IR verification has pathological performance (I ought
to file a bug on that), therefore I disable it. It is not clear if the IR
verifier is running in my opt runs. There is no line item for it.

It is also not clear if opt does instruction selection. I tried specifying
-filetype=null but that makes no difference to the run time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160309/dad147d5/attachment.html>

Justin Lebar via llvm-dev

2016-Mar-09 21:39 UTC

head link

[llvm-dev] Where is opt spending its time?

> What activity accounts for the unaccounted-for time?
If you're on Linux, consider using a proper CPU profiler, such as
perf(1).  It's really easy to use -- on x86-64, compiling
RelWithDebInfo with -fno-omit-frame-pointer has given me excellent
results.

Good luck.

On Wed, Mar 9, 2016 at 7:52 AM, David Jones via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I am trying to improve my application's compile-time performance.
>
> On a given workload, I take 68 seconds to compile some code. If I disable
> the LLVM code generation (i.e. I will generate IR instructions, but skip
the
> LLVM optimization and instruction selection steps) then my compile time
> drops to 3 seconds.  If I write out the LLVM IR (just to prove that I am
> generating it) then my compile time is 4 seconds. We're spending
>90% of the
> time in LLVM code generation.
>
> To try to determine if there's anything I can do, I ran:
>
>  time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll
> -time-passes
>
> and I get:
>
>
===-------------------------------------------------------------------------==>
... Pass execution timing report ...
>
===-------------------------------------------------------------------------==>
Total Execution Time: 19.1382 seconds (19.1587 wall clock)
>
>    ---User Time---   --System Time--   --User+System--   ---Wall Time---
> --- Name ---
>    4.4755 ( 23.5%)   0.0000 (  0.0%)   4.4755 ( 23.4%)   4.4806 ( 23.4%)
> Dead Store Elimination
>    3.6255 ( 19.0%)   0.0000 (  0.0%)   3.6255 ( 18.9%)   3.6282 ( 18.9%)
> Combine redundant instructions
>    1.2138 (  6.4%)   0.0040 (  5.0%)   1.2178 (  6.4%)   1.2185 (  6.4%)
> SROA
> ...
> real    1m7.783s
> user    1m7.548s
> sys     0m0.183s
>
> So: opt reports that it took 19 seconds, but overall, the run took 88
> seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The
> system is not running anything else at the time.
>
> What activity accounts for the unaccounted-for time?
>
> For my application, IR verification has pathological performance (I ought
to
> file a bug on that), therefore I disable it. It is not clear if the IR
> verifier is running in my opt runs. There is no line item for it.
>
> It is also not clear if opt does instruction selection. I tried specifying
> -filetype=null but that makes no difference to the run time.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

koffie drinker via llvm-dev

2016-Mar-10 10:04 UTC

head link

[llvm-dev] Where is opt spending its time?

Hi,

I'm having the same issue. You can speed up the JIT by disabling the code
gen optimizations.
when creating the execution engine:
.setOptLevel(llvm::CodeGenOpt::None)
and  try to enable Fast instruction selection
.setTargetOptions

But with the above applied my profiler (release mode ofcourse) is still
showing a lot of time spent in JIT (86%) code gen.
It's also weird that when I look at the individual functions in the
profile, malloc and free are taking up 80% of the total time.
40% of it is done with a smallvectorimpl resize in the passmanager.

The modules generally contains around 3 small functions. It should be fast.
For my project fast JIT time is more important than the actual runtime
since the statements are "simple". I do run a passmanager on functions
to
optimize the IR.

So what is generally the best approach when you require fast code
generation time ? Specifically, how to minimize time spent in going from IR
to native Code.


On Wed, Mar 9, 2016 at 4:52 PM, David Jones via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I am trying to improve my application's compile-time performance.
>
> On a given workload, I take 68 seconds to compile some code. If I disable
> the LLVM code generation (i.e. I will generate IR instructions, but skip
> the LLVM optimization and instruction selection steps) then my compile time
> drops to 3 seconds.  If I write out the LLVM IR (just to prove that I am
> generating it) then my compile time is 4 seconds. We're spending
>90% of
> the time in LLVM code generation.
>
> To try to determine if there's anything I can do, I ran:
>
>  time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll
> -time-passes
>
> and I get:
>
>
>
===-------------------------------------------------------------------------==>
... Pass execution timing report ...
>
>
===-------------------------------------------------------------------------==>
Total Execution Time: 19.1382 seconds (19.1587 wall clock)
>
>    ---User Time---   --System Time--   --User+System--   ---Wall Time---
> --- Name ---
>    4.4755 ( 23.5%)   0.0000 (  0.0%)   4.4755 ( 23.4%)   4.4806 ( 23.4%)
> Dead Store Elimination
>    3.6255 ( 19.0%)   0.0000 (  0.0%)   3.6255 ( 18.9%)   3.6282 ( 18.9%)
> Combine redundant instructions
>    1.2138 (  6.4%)   0.0040 (  5.0%)   1.2178 (  6.4%)   1.2185 (  6.4%)
> SROA
> ...
> real    1m7.783s
> user    1m7.548s
> sys     0m0.183s
>
> So: opt reports that it took 19 seconds, but overall, the run took 88
> seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The
> system is not running anything else at the time.
>
> What activity accounts for the unaccounted-for time?
>
> For my application, IR verification has pathological performance (I ought
> to file a bug on that), therefore I disable it. It is not clear if the IR
> verifier is running in my opt runs. There is no line item for it.
>
> It is also not clear if opt does instruction selection. I tried specifying
> -filetype=null but that makes no difference to the run time.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/cf891b64/attachment.html>

David Jones via llvm-dev

2016-Mar-10 11:30 UTC

head link

[llvm-dev] Where is opt spending its time?

Are you running the IR verifier?

You remark on the smallvectorimpl resize. This might be the same issue I
found in the IR verifier.

The verifier has a check that applies to address space casts. This check
will run even if you have no address space casts in your IR (I suspect the
usual case). The check applies to pointers embedded within data tables. My
IR has a lot of read-only data in tables with bitcast instructions to cast
between pointer types, and this places a load on the verifier far beyond
what the data structure is designed to hold.

A typical example: code generation for a large IR file takes 23 seconds,
but if you enable the verifier, it takes over a minute.

I really ought to file this bug properly.


On Thu, Mar 10, 2016 at 5:04 AM, koffie drinker <gekkekoe at gmail.com>
wrote:
> Hi,
>
> I'm having the same issue. You can speed up the JIT by disabling the
code
> gen optimizations.
> when creating the execution engine:
> .setOptLevel(llvm::CodeGenOpt::None)
> and  try to enable Fast instruction selection
> .setTargetOptions
>
> But with the above applied my profiler (release mode ofcourse) is still
> showing a lot of time spent in JIT (86%) code gen.
> It's also weird that when I look at the individual functions in the
> profile, malloc and free are taking up 80% of the total time.
> 40% of it is done with a smallvectorimpl resize in the passmanager.
>
> The modules generally contains around 3 small functions. It should be
> fast.
> For my project fast JIT time is more important than the actual runtime
> since the statements are "simple". I do run a passmanager on
functions to
> optimize the IR.
>
> So what is generally the best approach when you require fast code
> generation time ? Specifically, how to minimize time spent in going from IR
> to native Code.
>
>
> On Wed, Mar 9, 2016 at 4:52 PM, David Jones via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I am trying to improve my application's compile-time performance.
>>
>> On a given workload, I take 68 seconds to compile some code. If I
disable
>> the LLVM code generation (i.e. I will generate IR instructions, but
skip
>> the LLVM optimization and instruction selection steps) then my compile
time
>> drops to 3 seconds.  If I write out the LLVM IR (just to prove that I
am
>> generating it) then my compile time is 4 seconds. We're spending
>90% of
>> the time in LLVM code generation.
>>
>> To try to determine if there's anything I can do, I ran:
>>
>>  time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll
>> -time-passes
>>
>> and I get:
>>
>>
>>
===-------------------------------------------------------------------------==>>
... Pass execution timing report ...
>>
>>
===-------------------------------------------------------------------------==>>
Total Execution Time: 19.1382 seconds (19.1587 wall clock)
>>
>>    ---User Time---   --System Time--   --User+System--   ---Wall
Time---
>> --- Name ---
>>    4.4755 ( 23.5%)   0.0000 (  0.0%)   4.4755 ( 23.4%)   4.4806 (
23.4%)
>> Dead Store Elimination
>>    3.6255 ( 19.0%)   0.0000 (  0.0%)   3.6255 ( 18.9%)   3.6282 (
18.9%)
>> Combine redundant instructions
>>    1.2138 (  6.4%)   0.0040 (  5.0%)   1.2178 (  6.4%)   1.2185 ( 
6.4%)
>> SROA
>> ...
>> real    1m7.783s
>> user    1m7.548s
>> sys     0m0.183s
>>
>> So: opt reports that it took 19 seconds, but overall, the run took 88
>> seconds. The system in question is a 6-core AMD K10 with 8GB of memory.
The
>> system is not running anything else at the time.
>>
>> What activity accounts for the unaccounted-for time?
>>
>> For my application, IR verification has pathological performance (I
ought
>> to file a bug on that), therefore I disable it. It is not clear if the
IR
>> verifier is running in my opt runs. There is no line item for it.
>>
>> It is also not clear if opt does instruction selection. I tried
>> specifying -filetype=null but that makes no difference to the run time.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/a7ffc5d4/attachment.html>

llvm dev - Mar 2016 - Where is opt spending its time?

[llvm-dev] Where is opt spending its time?

[llvm-dev] Where is opt spending its time?

[llvm-dev] Where is opt spending its time?

[llvm-dev] Where is opt spending its time?