Hi all, A while ago there was a discussion on changing the current "set of trees" representation of TBAA metadata to be more expressive, prompted by the need to support C structs. Dan Gohman also talked about the issue here: http://llvm.org/devmtg/2012-11/Gohman-AliasAnalysis.pdf. It was suggested that the trees be replaced by a type DAG then. While working on this compiler <http://disciple.ouroborus.net/>, I ended up using an undirected graph to represent aliasing instead, I believe it might be suitable for TBAA's purposes as well, for the following reasons. * Does the graph need to be acyclic? Consider these struct types: struct a { type1 x; type2 y } struct b { type2 y; type3 z } struct c { type3 z; type1 x } They form the following alias graph (sorry about the crappy ascii art): a --> b --> c -+ ^______________| Which won't be representable if we force our tbaa graph to be acyclic. * Does it need to be directed? If we use a directed graph, then I suppose we'd be relying on "reachability" in both directions to find out if something aliases something else: +--> char <--+ | | float int ^ ^ |___ MyClass __| To answer "does myclass alias char?", we'd need to check (myclass is reachable from char || char is reachable from myclass), similar to how two things alias if they are ascendant/descendant of each other in the current tree representation. Since we're going both ways, I think it makes sense to just reify the reachability edges like this (note the lack of direction, now to check if two things alias, we simply ask if there is an edge connecting them): +--- char ---+ | | | float | int |___ MyClass __| This can be represented with a (pretty dense) adjacency matrix and queried in constant time, so I thought it might be faster.>From what I can gather without delving deep into the codebase, it's easiestright now to change from the "set of trees" representation to a directed graph, but I don't think changing to an undirected graph would be much harder (please correct me if I'm wrong). I'd like to try and tackle the implementation as well, if that's ok. Thoughts? Thanks! -- Tran. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130305/ad121a7a/attachment.html>
On Mon, Mar 4, 2013 at 11:12 PM, Tran Ma <tranma at cse.unsw.edu.au> wrote:> Hi all, > > A while ago there was a discussion on changing the current "set of trees" > representation of TBAA metadata to be more expressive, prompted by the need > to support C structs. Dan Gohman also talked about the issue here: > http://llvm.org/devmtg/2012-11/Gohman-AliasAnalysis.pdf. It was suggested > that the trees be replaced by a type DAG then. While working on this > compiler, I ended up using an undirected graph to represent aliasing > instead, I believe it might be suitable for TBAA's purposes as well, for the > following reasons. > > * Does the graph need to be acyclic? Consider these struct types: > > struct a { type1 x; type2 y } > > struct b { type2 y; type3 z } > > struct c { type3 z; type1 x } > > They form the following alias graph (sorry about the crappy ascii art): > > a --> b --> c -+ > > ^______________| > > Which won't be representable if we force our tbaa graph to be acyclic.I don't see why your example forms the above graph, could you explain? I believe it should form a forest with 3 roots (a, b, c), and the following edges a->x a->y b->y b->z c->z c->x I'm not sure how you got to a->b->c->a> > * Does it need to be directed? If we use a directed graph, then I suppose > we'd be relying on "reachability" in both directions to find out if > something aliases something else: > > +--> char <--+ > > | | > > float int > > ^ ^ > > |___ MyClass __| > > To answer "does myclass alias char?", we'd need to check (myclass is > reachable from char || char is reachable from myclass),Yes, the graphs essentially represent a set/subset relationship, and you are checking if either is a subset of the other.> similar to how two > things alias if they are ascendant/descendant of each other in the current > tree representation.> Since we're going both ways,There are actually cases you don't need to go both ways.> I think it makes sense to > just reify the reachability edges like this (note the lack of direction, now > to check if two things alias, we simply ask if there is an edge connecting > them): > > +--- char ---+ > > | | | > > float | int > > |___ MyClass __|But this is wrong, TBAA is in fact, directed. struct S { int i; double d; }; access to struct S can alias int or double access to double cannot alias struct S (access to int can because of the first member rule, but otherwise can't). In your undirected graph, you will say that a store to double can alias struct S.> > This can be represented with a (pretty dense) adjacency matrix and queried > in constant time, so I thought it might be faster. > > From what I can gather without delving deep into the codebase, it's easiest > right now to change from the "set of trees" representation to a directed > graph, but I don't think changing to an undirected graph would be much > harder (please correct me if I'm wrong). I'd like to try and tackle the > implementation as well, if that's ok. > > Thoughts? > > Thanks! > > -- > Tran. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 07/03/2013, at 1:45 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> On Wed, Mar 6, 2013 at 4:23 AM, Tran Ma <tranma at cse.unsw.edu.au> wrote: >> It was derived from what I read in Dan Gohman's slides about one of the >> possible forms a type DAG could take. Your forest is what we should get in >> the current tree representation (I believe), so when we have a load/store >> involving x, we can only annotate the instruction with the x node from the >> tree rooted at a or c. If we choose the tree at a, we'd lose the information >> that x does not alias z, and if we choose c, we won't know that x does not >> alias y. However if we use a graph, such as this: >> >> a b >> / \ / \ >> v v v v >> x y z >> ^ ^ >> \__ c __/ >> >> Then there wouldn't be a problem. This kind of graph was also described in >> the slides as one of the possible representations, I don't know what was >> eventually decided upon. > > The tree is what is currently used, but i'm also not sure the graph > from his slides will really work. > We'll see.The second approach (titled "a more precise type DAG") looks promising, I am convinced that a graph is what we need (the other suggestion was to make it possible to attach a list of metadata nodes to instructions, which was deemed too cumbersome), I'm trying to convince myself about the A and D part.> You can't actually form cyclic graphs if you follow the typing rules, > because mutually recursive types are impossible. > However, It's entirely possible that Dan has described a > representation/implementation that causes cycles :)I believe you are right. I'm not saying cycles are bad, for my use case I don't actually care whether the representation is acyclic or not, it's just to make sure we don't make the graph more restrictive than it has to be.>> The reason why I assumed direction wasn't needed is because I thought the >> relation `alias` was symmetric: if x aliases y then y alias x in all cases. >> Is this the case for tbaa? > > It should not be, but LLVM may have done this :) > See the difference in GCC's alias.c between alias_sets_conflict_p and > alias_set_subset_of_p. > > You can see where the subset (one-way checking) is used in tree-ssa-alias.c > > If LLVM does always use this rule, and plans on keeping it that way, > you are correct that there is no point in having a directed graph at > all, since you aren't using the direction.That's quite strange. I'm not too familiar with the implementation, but it sounds like keeping the direction is easier than dropping it (despite what the docs say and the symmetry of `alias`), so a change to using DAGs is probably most likely to be accepted. (How do I go about doing this by the way? Do it and send someone a patch?) Thanks heaps, -- Tran (orodru.in)
On Wed, Mar 6, 2013 at 12:31 PM, Tran Ma <tranma at cse.unsw.edu.au> wrote:> On 07/03/2013, at 1:45 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > >> On Wed, Mar 6, 2013 at 4:23 AM, Tran Ma <tranma at cse.unsw.edu.au> wrote: >>> It was derived from what I read in Dan Gohman's slides about one of the >>> possible forms a type DAG could take. Your forest is what we should get in >>> the current tree representation (I believe), so when we have a load/store >>> involving x, we can only annotate the instruction with the x node from the >>> tree rooted at a or c. If we choose the tree at a, we'd lose the information >>> that x does not alias z, and if we choose c, we won't know that x does not >>> alias y. However if we use a graph, such as this: >>> >>> a b >>> / \ / \ >>> v v v v >>> x y z >>> ^ ^ >>> \__ c __/ >>> >>> Then there wouldn't be a problem. This kind of graph was also described in >>> the slides as one of the possible representations, I don't know what was >>> eventually decided upon. >> >> The tree is what is currently used, but i'm also not sure the graph >> from his slides will really work. >> We'll see. > > The second approach (titled "a more precise type DAG") looks promising, I am convinced that a graph is what we need (the other suggestion was to make it possible to attach a list of metadata nodes to instructions, which was deemed too cumbersome), I'm trying to convince myself about the A and D part.Well, as you said, if the docs say they always check both directions, the directioness is unnecessary. Acyclicness could be guaranteed if necessary, and makes things easier (otherwise doing walks is hard).> >> You can't actually form cyclic graphs if you follow the typing rules, >> because mutually recursive types are impossible. >> However, It's entirely possible that Dan has described a >> representation/implementation that causes cycles :) > > I believe you are right. I'm not saying cycles are bad, for my use case I don't actually care whether the representation is acyclic or not, it's just to make sure we don't make the graph more restrictive than it has to be. > >>> The reason why I assumed direction wasn't needed is because I thought the >>> relation `alias` was symmetric: if x aliases y then y alias x in all cases. >>> Is this the case for tbaa? >> >> It should not be, but LLVM may have done this :) >> See the difference in GCC's alias.c between alias_sets_conflict_p and >> alias_set_subset_of_p. >> >> You can see where the subset (one-way checking) is used in tree-ssa-alias.c >> >> If LLVM does always use this rule, and plans on keeping it that way, >> you are correct that there is no point in having a directed graph at >> all, since you aren't using the direction. > > That's quite strange. I'm not too familiar with the implementation, but it sounds like keeping the direction is easier than dropping it (despite what the docs say and the symmetry of `alias`), so a change to using DAGs is probably most likely to be accepted. (How do I go about doing this by the way? Do it and send someone a patch?)You would normally start a discussion (which Dan i think already did somewhere), provide motivating examples, and a patch to do it, and send it to the mailing list. Note that your idea of undirected graphs and adding transitive edges would require transitive closure of the graph, which is N^3 initially. Since metadata can be added at runtime, you would have to redo this transitive closure on updates to metadata. In the presence of no-cycles, this is at least N^2 per update. ( there are significantly more complex data structures that reduce the time, but ...).> > Thanks heaps, > -- > Tran (orodru.in)