Paul J. Lucas
2012-Nov-07 15:12 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
On Nov 6, 2012, at 11:49 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> I think you may have gone beyond what I understand in how the legacy JIT code works. It looks like the call to addGlobalMapping should short-circuit the named function look up that I described ...Well, I first look for the function by name and, if I didn't find it, then I call addGlobalMapping(). But that's not where the time is going. Here: https://dl.dropbox.com/u/46791180/callgraph.pdf is a call graph generated by kcachegrind. I still don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, "Schedule..." is called 16K times and "setHeightToAtLeas..." is called 37K times. On the right fork, RAGreed... is called 35K times. Those are far too many calls to *anything* for a simple sequence of "call" LLVM instructions. Something seems horribly wrong. - Paul
Kaylor, Andrew
2012-Nov-12 21:52 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
Hi Paul, This is definitely outside the area where I know the particulars of what's going on. However, one idea that might be worth trying is setting the JIT optimization level to 'CodeGenOpt::None'. This should trigger the use of the FastISel instruction selector. Normally, you wouldn't want that for anything other than generating debug code, but since your routines are just making calls, it might work for you. -Andy -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Paul J. Lucas Sent: Wednesday, November 07, 2012 7:13 AM To: llvmdev at cs.uiuc.edu List Subject: Re: [LLVMdev] Using LLVM to serialize object state -- and performance On Nov 6, 2012, at 11:49 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> I think you may have gone beyond what I understand in how the legacy JIT code works. It looks like the call to addGlobalMapping should short-circuit the named function look up that I described ...Well, I first look for the function by name and, if I didn't find it, then I call addGlobalMapping(). But that's not where the time is going. Here: https://dl.dropbox.com/u/46791180/callgraph.pdf is a call graph generated by kcachegrind. I still don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, "Schedule..." is called 16K times and "setHeightToAtLeas..." is called 37K times. On the right fork, RAGreed... is called 35K times. Those are far too many calls to *anything* for a simple sequence of "call" LLVM instructions. Something seems horribly wrong. - Paul _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Paul J. Lucas
2012-Nov-13 19:27 UTC
[LLVMdev] Using LLVM to serialize object state -- and performance
Switching to CodeGenOpt::None reduced the execution time from 5.74s to 0.84s. By just tweaking things randomly, changing to CodeModel::Small reduced it further to 0.22s. We have some old, ugly, pure C++ code that we're trying to replace (both because it's ugly and because it's slow). It's execution time is about 0.089s, so that's the time to beat. Hence, I'd like to reduce the 0.22s time even further to below 0.089s. Any ideas? - Paul On Nov 12, 2012, at 1:52 PM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:> Hi Paul, > > This is definitely outside the area where I know the particulars of what's going on. However, one idea that might be worth trying is setting the JIT optimization level to 'CodeGenOpt::None'. This should trigger the use of the FastISel instruction selector. Normally, you wouldn't want that for anything other than generating debug code, but since your routines are just making calls, it might work for you. > > -Andy > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Paul J. Lucas > Sent: Wednesday, November 07, 2012 7:13 AM > To: llvmdev at cs.uiuc.edu List > Subject: Re: [LLVMdev] Using LLVM to serialize object state -- and performance > > On Nov 6, 2012, at 11:49 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote: > >> I think you may have gone beyond what I understand in how the legacy JIT code works. It looks like the call to addGlobalMapping should short-circuit the named function look up that I described ... > > Well, I first look for the function by name and, if I didn't find it, then I call addGlobalMapping(). But that's not where the time is going. Here: > > https://dl.dropbox.com/u/46791180/callgraph.pdf > > is a call graph generated by kcachegrind. I still don't understand all the numbers (and this PDF seems not to include commas where it should), but if you look at the left fork, the bottom two ovals, "Schedule..." is called 16K times and "setHeightToAtLeas..." is called 37K times. On the right fork, RAGreed... is called 35K times. > > Those are far too many calls to *anything* for a simple sequence of "call" LLVM instructions. Something seems horribly wrong. > > - Paul > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Maybe Matching Threads
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance
- [LLVMdev] Using LLVM to serialize object state -- and performance