Xinliang David Li via llvm-dev
2016-Jun-20 20:17 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Sun, Jun 19, 2016 at 12:01 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi David, > > Xinliang David Li wrote: > >> I believe it is primarily used for ordering the visitation of > CallSCC's (i.e. SCC's in the "call graph"). > > This is what it can do -- but what benefit does it provide? > > One benefit is that once you get to a function F that constructs an > instance of a class with virtual functions and then calls a virtual > function on the instance, then the virtual function being called and > the constructor will have been maximally simplified (F refs the > constructor, and the constructor refs all the virtual functions), and > you're more likely to inline the constructor and devirtualize the > call. I don't have any real data to back up that this will materially > help, though.Sanjoy, this is a good example. The code pattern is basically like this: Worker(Base *B) { B->vCall(); } Factory::create(Kind K) { if (K == ..) return new D1(); else ... } Caller() { .. Base *B = Factory::create(K, ...); Worker(B); } The added ordering constraints from Factory::create() node to all virtual methods in Base's hierarchy ensures that after 1) Factory::create gets inlined to Caller, and 2) Worker(..) method gets inlined to Caller, and 3) newly exposed vcall gets devirtualized the inliner sees a callee to say D1::vCall which is already simplified. However, in real applications, what I see is the following pattern (for instances LLVM's Pass ) Caller() { Base *B = Factory::create(...); Stash (B); // store the object in some container to be retrieved later ... } SomeTask() { Base *B = findObject(...); B->vCall(); // do the work } Driver() { Caller(); // create objects ... SomeTask(); } Set aside the fact that it is usually much harder to do de-viritualization in this case, assuming the virtual call in SomeTask can be devritualized. What we need is that the virtual functions are processed before SomeTask node, but this is not guaranteed unless we also model the call edge ordering imposed by control flow. However, this is enforcing virtual methods to be processed before their object's creators. Are there other simpler ways to achieve the effect (if we have data to justify it)? David> > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/18624daa/attachment.html>
Sanjoy Das via llvm-dev
2016-Jun-20 20:50 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
Hi David, Xinliang David Li wrote: > [snip] > > However, in real applications, what I see is the following pattern (for > instances LLVM's Pass ) > > Caller() { > Base *B = Factory::create(...); > Stash (B); // store the object in some container to be retrieved later > ... > } > > SomeTask() { > > Base *B = findObject(...); > B->vCall(); // do the work > } > > Driver() { > Caller(); // create objects ... > SomeTask(); > } > > Set aside the fact that it is usually much harder to do > de-viritualization in this case, assuming the virtual call in > SomeTask can be devritualized. What we need is that the virtual > functions are processed before SomeTask node, but this is not guaranteed > unless we also model the call edge ordering imposed by control flow. I think the thesis here is you cannot devirtualize the call in `SomeTask` without also looking at `Caller` [0]. So the flow is: - Optimize Caller, SomeTask independently as much as you want * Caller -refs-> Factory::create which -refs-> the constructors which -refs-> the various implementation of virtual functions (based on my current understanding of how C++ vtables are lowered); so these implementations should have been simplified by the time we look at Caller. - Then look at Driver. Caller, SomeTask are all maximally simplified. We now (presumably) inline Caller and SomeTask, devirtualize the B->vCall (as you said: theoretically possible, but if findObject etc. are complex then practically maybe not), and now inline the maximally simplified devirtualized call targets. > However, this is enforcing virtual methods to be processed before their > object's creators. Are there other simpler ways to achieve the effect > (if we have data to justify it)? Honestly: I'll have to think about it. It is entirely possible that a (much?) simpler design will catch 99% (or even better) of the idiomatic cases, I just don't have a good mental model for what those cases are. At this point I'm waiting for Chandler to upload his patch so that we can have this discussion on the review thread. :) [0]: This breaks down when we allow "out of thin air" devirtualizations (I'm stealing this term from memory models, but I think it is appropriate here :) ), where you look at call site and "magically" (i.e. in a way not expressible in terms of "normal" optimizations like store forwarding, pre, gvn etc.) are able to devirtualize the call site. We do this all the time in Java (we'll look at the type of the receiver object, look at the current class hierarchy and directly mandate that a certain call site has to have a certain target), but the RefSCC call graph does not allow for that. These kinds of out-of-thin-air devirtualizations will have to be modeled as ModulePass es, IIUC. -- Sanjoy
Xinliang David Li via llvm-dev
2016-Jun-20 21:07 UTC
[llvm-dev] Intended behavior of CGSCC pass manager.
On Mon, Jun 20, 2016 at 1:50 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi David, > > Xinliang David Li wrote: > > [snip] > > > > However, in real applications, what I see is the following pattern (for > > instances LLVM's Pass ) > > > > Caller() { > > Base *B = Factory::create(...); > > Stash (B); // store the object in some container to be retrieved > later > > ... > > } > > > > SomeTask() { > > > > Base *B = findObject(...); > > B->vCall(); // do the work > > } > > > > Driver() { > > Caller(); // create objects ... > > SomeTask(); > > } > > > > Set aside the fact that it is usually much harder to do > > de-viritualization in this case, assuming the virtual call in > > SomeTask can be devritualized. What we need is that the virtual > > functions are processed before SomeTask node, but this is not guaranteed > > unless we also model the call edge ordering imposed by control flow. > > I think the thesis here is you cannot devirtualize the call in > `SomeTask` without also looking at `Caller` [0]. So the flow is: > > - Optimize Caller, SomeTask independently as much as you want > * Caller -refs-> Factory::create which -refs-> the constructors > which -refs-> the various implementation of virtual functions > (based on my current understanding of how C++ vtables are > lowered); so these implementations should have been simplified by > the time we look at Caller. > > - Then look at Driver. Caller, SomeTask are all maximally > simplified. We now (presumably) inline Caller and SomeTask, > devirtualize the B->vCall (as you said: theoretically possible, but > if findObject etc. are complex then practically maybe not), and now > inline the maximally simplified devirtualized call targets.I agree with the analysis. Practically speaking, this pretty much means the theoretical opportunities won't be exposed until after lots of functions are inlined to the top level functions which usually don't happen. I have not seen some cases practically myself.> > > > However, this is enforcing virtual methods to be processed before their > > object's creators. Are there other simpler ways to achieve the effect > > (if we have data to justify it)? > > Honestly: I'll have to think about it. It is entirely possible that a > (much?) simpler design will catch 99% (or even better) of the > idiomatic cases, I just don't have a good mental model for what those > cases are. > > At this point I'm waiting for Chandler to upload his patch so that we > can have this discussion on the review thread. :) > > > [0]: This breaks down when we allow "out of thin air" > devirtualizations (I'm stealing this term from memory models, but I > think it is appropriate here :) ), where you look at call site and > "magically" (i.e. in a way not expressible in terms of "normal" > optimizations like store forwarding, pre, gvn etc.) are able to > devirtualize the call site. We do this all the time in Java (we'll > look at the type of the receiver object, look at the current class > hierarchy and directly mandate that a certain call site has to have a > certain target), but the RefSCC call graph does not allow for that. > These kinds of out-of-thin-air devirtualizations will have to be > modeled as ModulePass es, IIUC.yes, not all bottom passes have to be grouped with other SCC passes, nor are all IPA optimizations suitable to be implemented as bottom-up passes. David> > > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/ba481867/attachment.html>