thr3ads.net - llvm dev - [llvm-dev] Intended behavior of CGSCC pass manager. [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2016-Jun-17 07:49 UTC

[llvm-dev] Intended behavior of CGSCC pass manager.

On Fri, Jun 17, 2016 at 12:27 AM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Fri, Jun 17, 2016 at 12:10 AM, Xinliang David Li <davidxl at
google.com>
> wrote:
>
>>
>>
>> On Thu, Jun 16, 2016 at 11:51 PM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jun 16, 2016 at 11:07 PM, Xinliang David Li <davidxl at
google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Jun 16, 2016 at 10:47 PM, Sean Silva <chisophugis at
gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2016 at 9:53 PM, Xinliang David Li
<davidxl at google.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 16, 2016 at 11:12 AM, Sanjoy Das <
>>>>>> sanjoy at playingwithpointers.com> wrote:
>>>>>>
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> On Thu, Jun 16, 2016 at 4:48 AM, Sean Silva via
llvm-dev
>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>> > One question is what invariants we want to
provide for the
>>>>>>> visitation.
>>>>>>> >
>>>>>>> > For example, should a CGSCC pass be able to
assume that all
>>>>>>> "child" SCC's
>>>>>>> > (SCC's it can reach via direct calls
emanating from the SCC being
>>>>>>> visited)
>>>>>>> > have already been visited? Currently I
don't think it can, and
>>>>>>> IIRC from the
>>>>>>> > discussion at the social this is one thing
that Chandler is hoping
>>>>>>> to fix.
>>>>>>> > The "ref edge" notion in
LazyCallGraph ensures that we cannot
>>>>>>> create a call
>>>>>>> > edge (devirtualizing a ref edge) that will
point at an SCC that
>>>>>>> has not yet
>>>>>>> > been visited.
>>>>>>> >
>>>>>>> > E.g. consider this graph:
>>>>>>> >
>>>>>>> > digraph G {
>>>>>>> >   A -> B; B -> A; // SCC {A,B}
>>>>>>> >   S -> T; T -> S; // SCC {S,T}
>>>>>>> >   X -> Y; Y -> X; // SCC {X,Y}
>>>>>>> >
>>>>>>> >   B -> X;
>>>>>>> >   B -> S;
>>>>>>> >   T -> Y [label="Ref edge that is
devirtualized\nwhen visiting SCC
>>>>>>> > {S,T}",constraint=false,style=dashed]
>>>>>>> > }
>>>>>>> >
>>>>>>> > (can visualize conveniently at
http://www.webgraphviz.com/ or I
>>>>>>> have put an
>>>>>>> > image at http://reviews.llvm.org/F2073104)
>>>>>>> >
>>>>>>> > If we do not consider the ref graph, then it
is possible for SCC
>>>>>>> {S,T} to be
>>>>>>>
>>>>>>> I'm not sure why you wouldn't consider the
ref graph?  I think the
>>>>>>> general idea is to visit RefSCCs in bottom up
order, and when
>>>>>>> visiting
>>>>>>> a RefSCC, visiting the SCC's inside the RefSCC
in bottom up order.
>>>>>>>
>>>>>>> So in your example, given the edges you've
shown, we will visit {X,Y}
>>>>>>> before visiting {S,T}.
>>>>>>>
>>>>>>> > A more complicated case is when SCC {S,T} and
SCC {X,Y} both call
>>>>>>> into each
>>>>>>> > other via function pointers. So eventually
after devirtualizing
>>>>>>> the calls in
>>>>>>> > both directions there will be a single SCC
{S,T,X,Y}.
>>>>>>> >
>>>>>>> > digraph G {
>>>>>>> >   A -> B; B -> A; // SCC {A,B}
>>>>>>> >   S -> T; T -> S; // SCC {S,T}
>>>>>>> >   X -> Y; Y -> X; // SCC {X,Y}
>>>>>>> >
>>>>>>> >   B -> X;
>>>>>>> >   B -> S;
>>>>>>> >   T -> Y [label="Ref edge that is
devirtualized\nwhen visiting SCC
>>>>>>> > {S,T}",constraint=false,style=dashed]
>>>>>>> >   X -> S [label="Ref edge that is
devirtualized\nwhen visiting SCC
>>>>>>> > {X,Y}",constraint=false,style=dashed]
>>>>>>> > }
>>>>>>> >
>>>>>>> > (rendering at:
http://reviews.llvm.org/F2073479)
>>>>>>> >
>>>>>>> > Due to the cyclic dependence there is no SCC
visitation order that
>>>>>>> can
>>>>>>> > directly provide the invariant above. Indeed,
the problem of
>>>>>>> maintaining the
>>>>>>>
>>>>>>> I think the workflow in the above will (roughly)
be:
>>>>>>>
>>>>>>>  Visit the RefSCC {X,Y,S,T}
>>>>>>>
>>>>>>
>>>>>> Are we sure RefSCC has ref edges between {X, Y} and {S,
T} in this
>>>>>> case?  I may miss the code handling it.
>>>>>>
>>>>>
>>>>> The ref graph is conservative and so it would have the
appropriate
>>>>> edges.
>>>>>
>>>>>
https://github.com/llvm-project/llvm-project/blob/master/llvm/lib/Analysis/LazyCallGraph.cpp#L90
>>>>>
>>>>
>>>>
>>>> That is where the confusion is -- the findReferences's
function body
>>>> only handles constant operands which may be function addresses.
Did I miss
>>>> something obvious?
>>>>
>>>
>>> It is somewhat subtle. There are 3 potential meanings of "call
graph" in
>>> this thread:
>>> 1. The graph of *direct calls* in the current module. (this is
mutated
>>> during optimization)
>>> 2. A supergraph of 1., conservatively chosen such 1. remains a
subgraph
>>> under arbitrary semantics-preserving function transformations
(note: 2. and
>>> 1. must be updated in sync during deletion of functions).
>>> 3. The conservative graph representing all edges which may exist
*at
>>> runtime*.
>>>
>>> LazyCallGraph models 1. (the "call graph") and 2. (the
"ref graph"). It
>>> does not model 3.
>>> The search for constants in findReferences guarantees that we find
(a
>>> conserative superset of) all call destinations addresses that
>>> transformations may add direct calls (and hence update 1.).
>>>
>>> The existing CallGraph data structure (used by e.g. the old PM
CGSCC
>>> visitation) only models 1.
>>>
>>>
>>>
>>>>   Another point is that it may not be practical to model edges
to
>>>> indirect targets. For virtual calls, without CHA, each virtual
callsite
>>>> will end up referencing all potential address taken functions.
>>>>
>>>
>>> In this statement, you are thinking in terms of 3.
>>> Note that in both 1. and 2. the indirect calls are modeled as
calling an
>>> external dummy node. This dummy node is *distinct* from a second
external
>>> dummy node that calls all address-taken functions. (hence the
indirect
>>> calls do not end up forming a RefSCC with all address taken
functions).
>>>
>>
>>
>> For C++ programs, most ref edges are probably from ctors and dtors that
>> reference vtable -- they will have large fan-out of ref edges to
virtual
>> member functions they are unlikely to call.  In other words, such ref
edges
>> mostly do not represent real caller->callee relationship, so I
wonder what
>> is the advantage of forming refSCC from SCCs?
>>
>
>
> I believe it is primarily used for ordering the visitation of CallSCC's
> (i.e. SCC's in the "call graph").
>
This is what it can do -- but what benefit does it provide?

David


>
> -- Sean Silva
>
>
>>
>> David
>>
>>
>>
>>> Note that `opt -dot-callgraph` can produce a graphviz file for the
old
>>> PM callgraph. I'm not sure if we have one for LazyCallGraph
(which has both
>>> ref graph and call graph); I'll post a patch adding one if not.
>>>
>>> -- Sean Silva
>>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>    Visit the SCC {X,Y} // arbitrary
>>>>>>>      Optimize({X,Y})
>>>>>>>      // Now there's an edge to {S,T},
invalidate
>>>>>>>      // the analyses cached for {X,Y} and visit
{S,T}
>>>>>>>
>>>>>>
>>>>>> I am not sure if this makes sense.   If dynamically,
the call edge
>>>>>> from {X, Y} to {S, T} does exist, but not discovered by
the analysis, then
>>>>>> the cached {X, Y} will still be invalid, but who is
going to invalidate it?
>>>>>>
>>>>>
>>>>> I assume that if dynamically there was a call from {X,Y} to
{S,T},
>>>>> then the analysis would have observed an indirect call and
would have
>>>>> behaved conservatively.
>>>>>
>>>>
>>>> See above, I did not see how indirect call is handled. Also if
the
>>>> result for {X, Y} is conservatively correct before the direct
call edge is
>>>> discovered, why bother invalidating its analysis when the call
edge is
>>>> exposed?
>>>>
>>>> David
>>>>
>>>>
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>    Visit the SCC {S,T}
>>>>>>>      Optimize({S,T})
>>>>>>>      // Now {X,Y,S,T} collapses to form a single
SCC
>>>>>>>    Visit the SCC {S,T,X,Y}
>>>>>>>      Optimize({S,T,X,Y})
>>>>>>>
>>>>>>> The difficult bit is to make the inner "//
Now.*" bits work well.
>>>>>>>
>>>>>>> -- Sanjoy
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/3fe66e8d/attachment.html>

Sanjoy Das via llvm-dev

2016-Jun-19 07:01 UTC

head link

[llvm-dev] Intended behavior of CGSCC pass manager.

Hi David,

Xinliang David Li wrote:
 >>  I believe it is primarily used for ordering the visitation of
CallSCC's (i.e. SCC's in the "call graph").
 > This is what it can do -- but what benefit does it provide?

One benefit is that once you get to a function F that constructs an
instance of a class with virtual functions and then calls a virtual
function on the instance, then the virtual function being called and
the constructor will have been maximally simplified (F refs the
constructor, and the constructor refs all the virtual functions), and
you're more likely to inline the constructor and devirtualize the
call.  I don't have any real data to back up that this will materially
help, though.

-- Sanjoy

Sean Silva via llvm-dev

2016-Jun-20 15:43 UTC

head link

[llvm-dev] Intended behavior of CGSCC pass manager.

On Sun, Jun 19, 2016 at 12:01 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> Hi David,
>
> Xinliang David Li wrote:
> >>  I believe it is primarily used for ordering the visitation of
> CallSCC's (i.e. SCC's in the "call graph").
> > This is what it can do -- but what benefit does it provide?
>
> One benefit is that once you get to a function F that constructs an
> instance of a class with virtual functions and then calls a virtual
> function on the instance, then the virtual function being called and
> the constructor will have been maximally simplified (F refs the
> constructor, and the constructor refs all the virtual functions), and
> you're more likely to inline the constructor and devirtualize the
> call.

That is true for a graph like http://reviews.llvm.org/F2073104 but not one
like http://reviews.llvm.org/F2073479
That is, there is no real guarantee.

>   I don't have any real data to back up that this will materially
> help, though.

And we haven't had an RFC for any of this...

-- Sean Silva

>
>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/1248956a/attachment.html>

Xinliang David Li via llvm-dev

2016-Jun-20 20:17 UTC

head link

[llvm-dev] Intended behavior of CGSCC pass manager.

On Sun, Jun 19, 2016 at 12:01 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> Hi David,
>
> Xinliang David Li wrote:
> >>  I believe it is primarily used for ordering the visitation of
> CallSCC's (i.e. SCC's in the "call graph").
> > This is what it can do -- but what benefit does it provide?
>
> One benefit is that once you get to a function F that constructs an
> instance of a class with virtual functions and then calls a virtual
> function on the instance, then the virtual function being called and
> the constructor will have been maximally simplified (F refs the
> constructor, and the constructor refs all the virtual functions), and
> you're more likely to inline the constructor and devirtualize the
> call.  I don't have any real data to back up that this will materially
> help, though.


Sanjoy, this is a good example. The code pattern is basically like this:

Worker(Base *B) {
    B->vCall();
}

Factory::create(Kind K) {
   if (K == ..) return new D1();
   else ...
}

Caller() {
    ..
    Base *B = Factory::create(K, ...);
    Worker(B);
}


The added ordering constraints from Factory::create() node to all virtual
methods in Base's hierarchy ensures that after
1) Factory::create gets inlined to Caller, and
2) Worker(..) method gets inlined to Caller, and
3) newly exposed vcall gets devirtualized

the inliner sees a callee to say D1::vCall which is already simplified.


However, in real applications, what I see is the following pattern (for
instances LLVM's Pass )

Caller() {
    Base *B =  Factory::create(...);
    Stash (B);  // store the object in some container to be retrieved later
  ...
}

SomeTask() {

   Base *B = findObject(...);
   B->vCall(); // do the work
}

Driver() {
     Caller();  // create objects ...
     SomeTask();
}


Set aside the fact that it is usually much harder to do de-viritualization
in this case,   assuming  the virtual call in SomeTask can be
devritualized. What we need is that the virtual functions are processed
before SomeTask node, but this is not guaranteed unless we also model the
call edge ordering imposed by control flow.

However, this is enforcing virtual methods to be processed before their
object's creators. Are there other simpler ways to achieve the effect (if
we have data to justify it)?

David



>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/18624daa/attachment.html>

llvm dev - Jun 2016 - Intended behavior of CGSCC pass manager.

[llvm-dev] Intended behavior of CGSCC pass manager.

[llvm-dev] Intended behavior of CGSCC pass manager.

[llvm-dev] Intended behavior of CGSCC pass manager.

[llvm-dev] Intended behavior of CGSCC pass manager.