thr3ads.net - llvm dev - [LLVMdev] tbaa metadata representation [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Tran Ma

2013-Mar-05 07:12 UTC

[LLVMdev] tbaa metadata representation

Hi all,

A while ago there was a discussion on changing the current "set of
trees"
representation of TBAA metadata to be more expressive, prompted by the need
to support C structs. Dan Gohman also talked about the issue here:
http://llvm.org/devmtg/2012-11/Gohman-AliasAnalysis.pdf. It was suggested
that the trees be replaced by a type DAG then. While working on this
compiler <http://disciple.ouroborus.net/>, I ended up using an undirected
graph to represent aliasing instead, I believe it might be suitable for
TBAA's purposes as well, for the following reasons.

* Does the graph need to be acyclic? Consider these struct types:

    struct a { type1 x; type2 y }

    struct b { type2 y; type3 z }

    struct c { type3 z; type1 x }

They form the following alias graph (sorry about the crappy ascii art):

    a --> b --> c -+

    ^______________|

Which won't be representable if we force our tbaa graph to be acyclic.

* Does it need to be directed? If we use a directed graph, then I suppose
we'd be relying on "reachability" in both directions to find out
if
something aliases something else:

    +-->  char  <--+

    |              |

   float          int

    ^              ^

    |___ MyClass __|

To answer "does myclass alias char?", we'd need to check (myclass
is
reachable from char || char is reachable from myclass), similar to how two
things alias if they are ascendant/descendant of each other in the current
tree representation. Since we're going both ways, I think it makes
sense to just
reify the reachability edges like this (note the lack of direction, now to
check if two things alias, we simply ask if there is an edge connecting
them):

    +---  char  ---+

    |       |      |

  float     |     int

    |___ MyClass __|

This can be represented with a (pretty dense) adjacency matrix and queried
in constant time, so I thought it might be faster.
>From what I can gather without delving deep into the codebase, it's
easiestright now to change from the "set of trees" representation to a
directed
graph, but I don't think changing to an undirected graph would be much
harder (please correct me if I'm wrong). I'd like to try and tackle the
implementation as well, if that's ok.

Thoughts?

Thanks!
-- 
Tran.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130305/ad121a7a/attachment.html>

Daniel Berlin

2013-Mar-06 00:15 UTC

head link

[LLVMdev] tbaa metadata representation

On Mon, Mar 4, 2013 at 11:12 PM, Tran Ma <tranma at cse.unsw.edu.au>
wrote:> Hi all,
>
> A while ago there was a discussion on changing the current "set of
trees"
> representation of TBAA metadata to be more expressive, prompted by the need
> to support C structs. Dan Gohman also talked about the issue here:
> http://llvm.org/devmtg/2012-11/Gohman-AliasAnalysis.pdf. It was suggested
> that the trees be replaced by a type DAG then. While working on this
> compiler, I ended up using an undirected graph to represent aliasing
> instead, I believe it might be suitable for TBAA's purposes as well,
for the
> following reasons.
>
> * Does the graph need to be acyclic? Consider these struct types:
>
>     struct a { type1 x; type2 y }
>
>     struct b { type2 y; type3 z }
>
>     struct c { type3 z; type1 x }
>
> They form the following alias graph (sorry about the crappy ascii art):
>
>     a --> b --> c -+
>
>     ^______________|
>
> Which won't be representable if we force our tbaa graph to be acyclic.
I don't see why your example forms the above graph, could you explain?

I believe it should form a forest with 3 roots (a, b, c), and the
following edges
a->x
a->y
b->y
b->z
c->z
c->x

I'm not sure how you got to a->b->c->a
>
> * Does it need to be directed? If we use a directed graph, then I suppose
> we'd be relying on "reachability" in both directions to find
out if
> something aliases something else:
>
>     +-->  char  <--+
>
>     |              |
>
>    float          int
>
>     ^              ^
>
>     |___ MyClass __|
>
> To answer "does myclass alias char?", we'd need to check
(myclass is
> reachable from char || char is reachable from myclass),
Yes, the graphs essentially represent a set/subset relationship, and
you are checking if either is a subset of the other.
>  similar to how two
> things alias if they are ascendant/descendant of each other in the current
> tree representation.
> Since we're going both ways,
There are actually cases you don't need to go both ways.
>  I think it makes sense to
> just reify the reachability edges like this (note the lack of direction,
now
> to check if two things alias, we simply ask if there is an edge connecting
> them):
>
>     +---  char  ---+
>
>     |       |      |
>
>   float     |     int
>
>     |___ MyClass __|
But this is wrong, TBAA is in fact, directed.

     struct S { int i; double d; };


access to struct S can alias int or double
access to double cannot alias struct S
(access to int can because of the first member rule, but otherwise can't).

In your undirected graph, you will say that a store to double can
alias struct S.
>
> This can be represented with a (pretty dense) adjacency matrix and queried
> in constant time, so I thought it might be faster.
>
> From what I can gather without delving deep into the codebase, it's
easiest
> right now to change from the "set of trees" representation to a
directed
> graph, but I don't think changing to an undirected graph would be much
> harder (please correct me if I'm wrong). I'd like to try and tackle
the
> implementation as well, if that's ok.
>
> Thoughts?
>
> Thanks!
>
> --
> Tran.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Tran Ma

2013-Mar-06 17:31 UTC

head link

[LLVMdev] tbaa metadata representation

On 07/03/2013, at 1:45 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
> On Wed, Mar 6, 2013 at 4:23 AM, Tran Ma <tranma at cse.unsw.edu.au>
wrote:
>> It was derived from what I read in Dan Gohman's slides about one of
the
>> possible forms a type DAG could take. Your forest is what we should get
in
>> the current tree representation (I believe), so when we have a
load/store
>> involving x, we can only annotate the instruction with the x node from
the
>> tree rooted at a or c. If we choose the tree at a, we'd lose the
information
>> that x does not alias z, and if we choose c, we won't know that x
does not
>> alias y. However if we use a graph, such as this:
>> 
>>   a     b
>>  / \   / \
>> v   v v   v
>> x     y     z
>> ^         ^
>>  \__ c __/
>> 
>> Then there wouldn't be a problem. This kind of graph was also
described in
>> the slides as one of the possible representations, I don't know
what was
>> eventually decided upon.
> 
> The tree is what is currently used, but i'm also not sure the graph
> from his slides will really work.
> We'll see.
The second approach (titled "a more precise type DAG") looks
promising, I am convinced that a graph is what we need (the other suggestion was
to make it possible to attach a list of metadata nodes to instructions, which
was deemed too cumbersome), I'm trying to convince myself about the A and D
part.
> You can't actually form cyclic graphs if you follow the typing rules,
> because mutually recursive types are impossible.
> However, It's entirely possible that Dan has described a
> representation/implementation that causes cycles :)
I believe you are right. I'm not saying cycles are bad, for my use case I
don't actually care whether the representation is acyclic or not, it's
just to make sure we don't make the graph more restrictive than it has to
be.
>> The reason why I assumed direction wasn't needed is because I
thought the
>> relation `alias` was symmetric: if x aliases y then y alias x in all
cases.
>> Is this the case for tbaa?
> 
> It should not be, but LLVM may have done this :)
> See the difference in GCC's alias.c between alias_sets_conflict_p and
> alias_set_subset_of_p.
> 
> You can see where the subset (one-way checking) is used in tree-ssa-alias.c
> 
> If LLVM does always use this rule, and plans on keeping it that way,
> you are correct that there is no point in having a directed graph at
> all, since you aren't using the direction.
That's quite strange. I'm not too familiar with the implementation, but
it sounds like keeping the direction is easier than dropping it (despite what
the docs say and the symmetry of `alias`), so a change to using DAGs is probably
most likely to be accepted. (How do I go about doing this by the way? Do it and
send someone a patch?)

Thanks heaps,
--
Tran (orodru.in)

Daniel Berlin

2013-Mar-07 16:42 UTC

head link

[LLVMdev] tbaa metadata representation

On Wed, Mar 6, 2013 at 12:31 PM, Tran Ma <tranma at cse.unsw.edu.au>
wrote:> On 07/03/2013, at 1:45 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>> On Wed, Mar 6, 2013 at 4:23 AM, Tran Ma <tranma at
cse.unsw.edu.au> wrote:
>>> It was derived from what I read in Dan Gohman's slides about
one of the
>>> possible forms a type DAG could take. Your forest is what we should
get in
>>> the current tree representation (I believe), so when we have a
load/store
>>> involving x, we can only annotate the instruction with the x node
from the
>>> tree rooted at a or c. If we choose the tree at a, we'd lose
the information
>>> that x does not alias z, and if we choose c, we won't know that
x does not
>>> alias y. However if we use a graph, such as this:
>>>
>>>   a     b
>>>  / \   / \
>>> v   v v   v
>>> x     y     z
>>> ^         ^
>>>  \__ c __/
>>>
>>> Then there wouldn't be a problem. This kind of graph was also
described in
>>> the slides as one of the possible representations, I don't know
what was
>>> eventually decided upon.
>>
>> The tree is what is currently used, but i'm also not sure the graph
>> from his slides will really work.
>> We'll see.
>
> The second approach (titled "a more precise type DAG") looks
promising, I am convinced that a graph is what we need (the other suggestion was
to make it possible to attach a list of metadata nodes to instructions, which
was deemed too cumbersome), I'm trying to convince myself about the A and D
part.
Well, as you said, if the docs say they always check both directions,
the directioness is unnecessary.
Acyclicness could be guaranteed if necessary, and makes things easier
(otherwise doing walks is hard).


>
>> You can't actually form cyclic graphs if you follow the typing
rules,
>> because mutually recursive types are impossible.
>> However, It's entirely possible that Dan has described a
>> representation/implementation that causes cycles :)
>
> I believe you are right. I'm not saying cycles are bad, for my use case
I don't actually care whether the representation is acyclic or not, it's
just to make sure we don't make the graph more restrictive than it has to
be.
>
>>> The reason why I assumed direction wasn't needed is because I
thought the
>>> relation `alias` was symmetric: if x aliases y then y alias x in
all cases.
>>> Is this the case for tbaa?
>>
>> It should not be, but LLVM may have done this :)
>> See the difference in GCC's alias.c between alias_sets_conflict_p
and
>> alias_set_subset_of_p.
>>
>> You can see where the subset (one-way checking) is used in
tree-ssa-alias.c
>>
>> If LLVM does always use this rule, and plans on keeping it that way,
>> you are correct that there is no point in having a directed graph at
>> all, since you aren't using the direction.
>
> That's quite strange. I'm not too familiar with the implementation,
but it sounds like keeping the direction is easier than dropping it (despite
what the docs say and the symmetry of `alias`), so a change to using DAGs is
probably most likely to be accepted. (How do I go about doing this by the way?
Do it and send someone a patch?)
You would normally start a discussion (which Dan i think already did
somewhere), provide motivating examples, and a patch to do it, and
send it to the mailing list.

Note that your idea of undirected graphs and adding transitive edges
would require transitive closure of the graph, which is N^3 initially.
Since metadata can be added at runtime, you would have to redo this
transitive closure on updates to metadata.
In the presence of no-cycles, this is at least N^2 per update. ( there
are significantly more complex data structures that reduce the time,
but ...).

>
> Thanks heaps,
> --
> Tran (orodru.in)

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Mar 2013 - [LLVMdev] tbaa metadata representation

[LLVMdev] tbaa metadata representation

[LLVMdev] tbaa metadata representation

[LLVMdev] tbaa metadata representation

[LLVMdev] tbaa metadata representation

Apparently Analagous Threads