I am trying to improve my application's compile-time performance. On a given workload, I take 68 seconds to compile some code. If I disable the LLVM code generation (i.e. I will generate IR instructions, but skip the LLVM optimization and instruction selection steps) then my compile time drops to 3 seconds. If I write out the LLVM IR (just to prove that I am generating it) then my compile time is 4 seconds. We're spending >90% of the time in LLVM code generation. To try to determine if there's anything I can do, I ran: time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll -time-passes and I get: ===-------------------------------------------------------------------------== ... Pass execution timing report ... ===-------------------------------------------------------------------------== Total Execution Time: 19.1382 seconds (19.1587 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 4.4755 ( 23.5%) 0.0000 ( 0.0%) 4.4755 ( 23.4%) 4.4806 ( 23.4%) Dead Store Elimination 3.6255 ( 19.0%) 0.0000 ( 0.0%) 3.6255 ( 18.9%) 3.6282 ( 18.9%) Combine redundant instructions 1.2138 ( 6.4%) 0.0040 ( 5.0%) 1.2178 ( 6.4%) 1.2185 ( 6.4%) SROA ... real 1m7.783s user 1m7.548s sys 0m0.183s So: opt reports that it took 19 seconds, but overall, the run took 88 seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The system is not running anything else at the time. What activity accounts for the unaccounted-for time? For my application, IR verification has pathological performance (I ought to file a bug on that), therefore I disable it. It is not clear if the IR verifier is running in my opt runs. There is no line item for it. It is also not clear if opt does instruction selection. I tried specifying -filetype=null but that makes no difference to the run time. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160309/dad147d5/attachment.html>
Justin Lebar via llvm-dev
2016-Mar-09 21:39 UTC
[llvm-dev] Where is opt spending its time?
> What activity accounts for the unaccounted-for time?If you're on Linux, consider using a proper CPU profiler, such as perf(1). It's really easy to use -- on x86-64, compiling RelWithDebInfo with -fno-omit-frame-pointer has given me excellent results. Good luck. On Wed, Mar 9, 2016 at 7:52 AM, David Jones via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I am trying to improve my application's compile-time performance. > > On a given workload, I take 68 seconds to compile some code. If I disable > the LLVM code generation (i.e. I will generate IR instructions, but skip the > LLVM optimization and instruction selection steps) then my compile time > drops to 3 seconds. If I write out the LLVM IR (just to prove that I am > generating it) then my compile time is 4 seconds. We're spending >90% of the > time in LLVM code generation. > > To try to determine if there's anything I can do, I ran: > > time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll > -time-passes > > and I get: > > ===-------------------------------------------------------------------------==> ... Pass execution timing report ... > ===-------------------------------------------------------------------------==> Total Execution Time: 19.1382 seconds (19.1587 wall clock) > > ---User Time--- --System Time-- --User+System-- ---Wall Time--- > --- Name --- > 4.4755 ( 23.5%) 0.0000 ( 0.0%) 4.4755 ( 23.4%) 4.4806 ( 23.4%) > Dead Store Elimination > 3.6255 ( 19.0%) 0.0000 ( 0.0%) 3.6255 ( 18.9%) 3.6282 ( 18.9%) > Combine redundant instructions > 1.2138 ( 6.4%) 0.0040 ( 5.0%) 1.2178 ( 6.4%) 1.2185 ( 6.4%) > SROA > ... > real 1m7.783s > user 1m7.548s > sys 0m0.183s > > So: opt reports that it took 19 seconds, but overall, the run took 88 > seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The > system is not running anything else at the time. > > What activity accounts for the unaccounted-for time? > > For my application, IR verification has pathological performance (I ought to > file a bug on that), therefore I disable it. It is not clear if the IR > verifier is running in my opt runs. There is no line item for it. > > It is also not clear if opt does instruction selection. I tried specifying > -filetype=null but that makes no difference to the run time. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
koffie drinker via llvm-dev
2016-Mar-10 10:04 UTC
[llvm-dev] Where is opt spending its time?
Hi, I'm having the same issue. You can speed up the JIT by disabling the code gen optimizations. when creating the execution engine: .setOptLevel(llvm::CodeGenOpt::None) and try to enable Fast instruction selection .setTargetOptions But with the above applied my profiler (release mode ofcourse) is still showing a lot of time spent in JIT (86%) code gen. It's also weird that when I look at the individual functions in the profile, malloc and free are taking up 80% of the total time. 40% of it is done with a smallvectorimpl resize in the passmanager. The modules generally contains around 3 small functions. It should be fast. For my project fast JIT time is more important than the actual runtime since the statements are "simple". I do run a passmanager on functions to optimize the IR. So what is generally the best approach when you require fast code generation time ? Specifically, how to minimize time spent in going from IR to native Code. On Wed, Mar 9, 2016 at 4:52 PM, David Jones via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I am trying to improve my application's compile-time performance. > > On a given workload, I take 68 seconds to compile some code. If I disable > the LLVM code generation (i.e. I will generate IR instructions, but skip > the LLVM optimization and instruction selection steps) then my compile time > drops to 3 seconds. If I write out the LLVM IR (just to prove that I am > generating it) then my compile time is 4 seconds. We're spending >90% of > the time in LLVM code generation. > > To try to determine if there's anything I can do, I ran: > > time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll > -time-passes > > and I get: > > > ===-------------------------------------------------------------------------==> ... Pass execution timing report ... > > ===-------------------------------------------------------------------------==> Total Execution Time: 19.1382 seconds (19.1587 wall clock) > > ---User Time--- --System Time-- --User+System-- ---Wall Time--- > --- Name --- > 4.4755 ( 23.5%) 0.0000 ( 0.0%) 4.4755 ( 23.4%) 4.4806 ( 23.4%) > Dead Store Elimination > 3.6255 ( 19.0%) 0.0000 ( 0.0%) 3.6255 ( 18.9%) 3.6282 ( 18.9%) > Combine redundant instructions > 1.2138 ( 6.4%) 0.0040 ( 5.0%) 1.2178 ( 6.4%) 1.2185 ( 6.4%) > SROA > ... > real 1m7.783s > user 1m7.548s > sys 0m0.183s > > So: opt reports that it took 19 seconds, but overall, the run took 88 > seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The > system is not running anything else at the time. > > What activity accounts for the unaccounted-for time? > > For my application, IR verification has pathological performance (I ought > to file a bug on that), therefore I disable it. It is not clear if the IR > verifier is running in my opt runs. There is no line item for it. > > It is also not clear if opt does instruction selection. I tried specifying > -filetype=null but that makes no difference to the run time. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/cf891b64/attachment.html>
Are you running the IR verifier? You remark on the smallvectorimpl resize. This might be the same issue I found in the IR verifier. The verifier has a check that applies to address space casts. This check will run even if you have no address space casts in your IR (I suspect the usual case). The check applies to pointers embedded within data tables. My IR has a lot of read-only data in tables with bitcast instructions to cast between pointer types, and this places a load on the verifier far beyond what the data structure is designed to hold. A typical example: code generation for a large IR file takes 23 seconds, but if you enable the verifier, it takes over a minute. I really ought to file this bug properly. On Thu, Mar 10, 2016 at 5:04 AM, koffie drinker <gekkekoe at gmail.com> wrote:> Hi, > > I'm having the same issue. You can speed up the JIT by disabling the code > gen optimizations. > when creating the execution engine: > .setOptLevel(llvm::CodeGenOpt::None) > and try to enable Fast instruction selection > .setTargetOptions > > But with the above applied my profiler (release mode ofcourse) is still > showing a lot of time spent in JIT (86%) code gen. > It's also weird that when I look at the individual functions in the > profile, malloc and free are taking up 80% of the total time. > 40% of it is done with a smallvectorimpl resize in the passmanager. > > The modules generally contains around 3 small functions. It should be > fast. > For my project fast JIT time is more important than the actual runtime > since the statements are "simple". I do run a passmanager on functions to > optimize the IR. > > So what is generally the best approach when you require fast code > generation time ? Specifically, how to minimize time spent in going from IR > to native Code. > > > On Wed, Mar 9, 2016 at 4:52 PM, David Jones via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> I am trying to improve my application's compile-time performance. >> >> On a given workload, I take 68 seconds to compile some code. If I disable >> the LLVM code generation (i.e. I will generate IR instructions, but skip >> the LLVM optimization and instruction selection steps) then my compile time >> drops to 3 seconds. If I write out the LLVM IR (just to prove that I am >> generating it) then my compile time is 4 seconds. We're spending >90% of >> the time in LLVM code generation. >> >> To try to determine if there's anything I can do, I ran: >> >> time /tools/llvm/3.7.1dbg/bin/opt -O1 -filetype=obj -o opt.o my_ir.ll >> -time-passes >> >> and I get: >> >> >> ===-------------------------------------------------------------------------==>> ... Pass execution timing report ... >> >> ===-------------------------------------------------------------------------==>> Total Execution Time: 19.1382 seconds (19.1587 wall clock) >> >> ---User Time--- --System Time-- --User+System-- ---Wall Time--- >> --- Name --- >> 4.4755 ( 23.5%) 0.0000 ( 0.0%) 4.4755 ( 23.4%) 4.4806 ( 23.4%) >> Dead Store Elimination >> 3.6255 ( 19.0%) 0.0000 ( 0.0%) 3.6255 ( 18.9%) 3.6282 ( 18.9%) >> Combine redundant instructions >> 1.2138 ( 6.4%) 0.0040 ( 5.0%) 1.2178 ( 6.4%) 1.2185 ( 6.4%) >> SROA >> ... >> real 1m7.783s >> user 1m7.548s >> sys 0m0.183s >> >> So: opt reports that it took 19 seconds, but overall, the run took 88 >> seconds. The system in question is a 6-core AMD K10 with 8GB of memory. The >> system is not running anything else at the time. >> >> What activity accounts for the unaccounted-for time? >> >> For my application, IR verification has pathological performance (I ought >> to file a bug on that), therefore I disable it. It is not clear if the IR >> verifier is running in my opt runs. There is no line item for it. >> >> It is also not clear if opt does instruction selection. I tried >> specifying -filetype=null but that makes no difference to the run time. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/a7ffc5d4/attachment.html>