Paul J. Lucas
2012-Nov-13 19:27 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
Switching to CodeGenOpt::None reduced the execution time from 5.74s to 0.84s. By just tweaking things randomly, changing to CodeModel::Small reduced it further to 0.22s. We have some old, ugly, pure C++ code that we're trying to replace (both because it's ugly and because it's slow). It's execution time is about 0.089s, so that's the time to beat. Hence, I'd like to reduce the 0.22s time even further to below 0.089s. Any ideas? - Paul On Nov 12, 2012, at 1:52 PM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> Hi Paul, > > This is definitely outside the area where I know the particulars of what's going on. However, one idea that might be worth trying is setting the JIT optimization level to 'CodeGenOpt::None'. This should trigger the use of the FastISel instruction selector. Normally, you wouldn't want that for anything other than generating debug code, but since your routines are just making calls, it might work for you. > > -Andy > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Paul J. Lucas > Sent: Wednesday, November 07, 2012 7:13 AM > To: llvmdev at cs.uiuc.edu List > Subject: Re: [LLVMdev] Using LLVM to serialize object state -- and performance > > On Nov 6, 2012, at 11:49 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: > >> I think you may have gone beyond what I understand in how the legacy JIT code works. It looks like the call to addGlobalMapping should short-circuit the named function look up that I described ... > > Well, I first look for the function by name and, if I didn't find it, then I call addGlobalMapping(). But that's not where the time is going. Here: > > https://dl.dropbox.com/u/46791180/callgraph.pdf > > is a call graph generated by kcachegrind. I still don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, "Schedule..." is called 16K times and "setHeightToAtLeas..." is called 37K times. On the right fork, RAGreed... is called 35K times. > > Those are far too many calls to *anything* for a simple sequence of "call" LLVM instructions. Something seems horribly wrong. > > - Paul > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Paul J. Lucas
2012-Nov-14 00:33 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
I've been profiling more; see <https://dl.dropbox.com/u/46791180/perf.png>. One thing I'm a bit confused about is why I see a FunctionPassManager there. I use a FunctionPassManager at the end of LLVM IR code generation, write the IR to disk, then read it back later. Why is apparently another FunctionPassManager being used during the JIT'ing of the IR code? And how do I control what the passes are to that FunctionPassManager? The function that's being JIT'd has to executed only once so, ideally, I want to find a sweet spot between speed-of-JIT'ing and speed-of-generated-machine code. - Paul On Nov 13, 2012, at 11:27 AM, Paul J. Lucas <paul at lucasmail.org> wrote:> Switching to CodeGenOpt::None reduced the execution time from 5.74s to 0.84s. By just tweaking things randomly, changing to CodeModel::Small reduced it further to 0.22s. > > We have some old, ugly, pure C++ code that we're trying to replace (both because it's ugly and because it's slow). It's execution time is about 0.089s, so that's the time to beat. > > Hence, I'd like to reduce the 0.22s time even further to below 0.089s. Any ideas?
Anton Korobeynikov
2012-Nov-14 01:57 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
> Why is apparently another FunctionPassManager being used during the JIT'ing of the IR code?Because codegeneration consists of series of passes. See lib/CodeGen/LLVMTargetMachine.cpp and lib/CodeGen/Passes.cpp for more information> And how do I control what the passes are to that FunctionPassManager?You should not. There are some options though, like optimization level inside TargetMachine / TargetPassConfig -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Kaylor, Andrew
2012-Nov-14 02:15 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
The passes run are determined by TargetMachine::adPassesToEmitMachineCode (or addPassesToEmitMC in the case of MCJIT), which is called from the JIT constructor. You can step through that to see where the passes are coming from or you can create a custom target machine instance to control it. -Andy -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Paul J. Lucas Sent: Tuesday, November 13, 2012 4:33 PM To: llvmdev at cs.uiuc.edu List Subject: Re: [LLVMdev] Using LLVM to serialize object state -- and performance I've been profiling more; see <https://dl.dropbox.com/u/46791180/perf.png>. One thing I'm a bit confused about is why I see a FunctionPassManager there. I use a FunctionPassManager at the end of LLVM IR code generation, write the IR to disk, then read it back later. Why is apparently another FunctionPassManager being used during the JIT'ing of the IR code? And how do I control what the passes are to that FunctionPassManager? The function that's being JIT'd has to executed only once so, ideally, I want to find a sweet spot between speed-of-JIT'ing and speed-of-generated-machine code. - Paul On Nov 13, 2012, at 11:27 AM, Paul J. Lucas <paul at lucasmail.org> wrote:> Switching to CodeGenOpt::None reduced the execution time from 5.74s to 0.84s. By just tweaking things randomly, changing to CodeModel::Small reduced it further to 0.22s. > > We have some old, ugly, pure C++ code that we're trying to replace (both because it's ugly and because it's slow). It's execution time is about 0.089s, so that's the time to beat. > > Hence, I'd like to reduce the 0.22s time even further to below 0.089s. Any ideas?_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Maybe Matching Threads
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance