thr3ads.net - llvm dev - [llvm-dev] RFC: Resolving TBAA issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2017-Aug-17 21:49 UTC

[llvm-dev] RFC: Resolving TBAA issues

>
>
>
> For two given access sequences we can determine if the accessed
> objects are allowed to overlap by the rules of the input
> language.

Sadly, this is where this becomes "unlikely to want to use to replace
TBAA", at least for me. It may still be a thing we want anyway.

This scheme is really an encoding of C/C++ TBAA info so it can be read by
LLVM and requires that *LLVM* have some set of rules that it enforces about
that scheme.
But that scheme is still very language specific in how it is used.
GCC still has something in between this and LLVM, where the language rules
are a bit encoded (but not as much as you have).

We (and gcc) have deliberately avoided such schemes, in favor of
transforming the info into abstract set trees that then tag loads and
stores.

The encoding of "struct path" tbaa, is just a way of trading space vs
time
in that encoding.  We trade walking time for the space of transitive
closure, etc.

None of the TBAA in LLVM really has any *real* relation to the original
type system rules, and that is deliberate.  I've argued for years that
calling it "TBAA" just badly confuses people, and i believe here is a
good
example :)

So i don't think what you've written can be used to replace TBAA.

However, i *do* believe it would be useful to further optimize C and C++
programs in LLVM by using access paths.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/dc45cc46/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-17 23:49 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On 08/17/2017 04:49 PM, Daniel Berlin via llvm-dev
wrote:>
>
>
>     For two given access sequences we can determine if the accessed
>     objects are allowed to overlap by the rules of the input
>     language.
>
>
> Sadly, this is where this becomes "unlikely to want to use to replace 
> TBAA", at least for me. It may still be a thing we want anyway.
The rules proposed by Ivan for handling C/C++ seem pretty generic. We 
generally explain our current TBAA rules by saying that they're generic 
but motivated by C/C++ rules. I could say the same thing about this 
proposed system with its proposed rules.
>
> This scheme is really an encoding of C/C++ TBAA info so it can be read 
> by LLVM and requires that *LLVM* have some set of rules that it 
> enforces about that scheme.
> But that scheme is still very language specific in how it is used.
> GCC still has something in between this and LLVM, where the language 
> rules are a bit encoded (but not as much as you have).
>
> We (and gcc) have deliberately avoided such schemes, in favor of 
> transforming the info into abstract set trees that then tag loads and 
> stores.
>
> The encoding of "struct path" tbaa, is just a way of trading
space vs
> time in that encoding.  We trade walking time for the space of 
> transitive closure, etc.
If the provided statistic of 15% holds up, maybe which way we go in this 
trade-off space doesn't matter much?
>
> None of the TBAA in LLVM really has any *real* relation to the 
> original type system rules, and that is deliberate.  I've argued for 
> years that calling it "TBAA" just badly confuses people, and i
believe
> here is a good example :)
>
> So i don't think what you've written can be used to replace TBAA.
So *if* we just take the proposed rules for C/C++ as the rules for the 
scheme in general, I'm not sure this is true. There is an isomorphism it 
seems (we could auto-upgrade even, if we'd like, by mapping the current 
offset value onto the field id of this scheme (we'd need to use a 
different root name, however, so everything would remain consistent 
under LTO)).

  -Hal
>
> However, i *do* believe it would be useful to further optimize C and 
> C++ programs in LLVM by using access paths.
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/c2fe89e3/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-18 00:20 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On Thu, Aug 17, 2017 at 4:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 08/17/2017 04:49 PM, Daniel Berlin via llvm-dev wrote:
>
>
>>
>> For two given access sequences we can determine if the accessed
>> objects are allowed to overlap by the rules of the input
>> language.
>
>
> Sadly, this is where this becomes "unlikely to want to use to replace
> TBAA", at least for me. It may still be a thing we want anyway.
>
>
> The rules proposed by Ivan for handling C/C++ seem pretty generic.
>
Uh?

"

  [X...] is allowed to overlap with [S1..., X..., S2...]
  and the most generic access sequence is [X...].

  [X1..., X2...] is allowed to overlap with [S1..., X1...]
  with the most generic access sequence to be [X1...].

  [X1..., U, U::m1, X2...] is allowed to overlap with
  [S1..., X1..., U, U::m2, S2...]
  for a union U of an unknown effective type, provided m1 != m2
  and the most generic access sequence is [X1..., U].

If neither of the given sequences contains the leading access
type of the other, then they are allowed to overlap if the
leading access type of one sequence is a direct or indirect field
of the final access type of the other sequence and then the most
generic access sequence consists of a single element, which is
that final access type.

For the purpose of determining whether one type is a direct or
indirect member of another type unions are considered to have no
members as accesses to members of unions are only allowed to
overlap if they have the base union object explicitly specified."

This sounds super specific to me.  It talks about unions, and how they
behave in C/C++, etc.
It has initial sequence rules, etc.
These would make no sense for "Rust", "Go", or
"Haskell" at all.

We generally explain our current TBAA rules by saying that they're
generic> but motivated by C/C++ rules.
>
We do say that but that's not really what our implementation does in any
way.
> I could say the same thing about this proposed system with its proposed
> rules.
>It is, IMHO, nowhere near as generic.

>
>
> This scheme is really an encoding of C/C++ TBAA info so it can be read by
> LLVM and requires that *LLVM* have some set of rules that it enforces about
> that scheme.
> But that scheme is still very language specific in how it is used.
> GCC still has something in between this and LLVM, where the language rules
> are a bit encoded (but not as much as you have).
>
> We (and gcc) have deliberately avoided such schemes, in favor of
> transforming the info into abstract set trees that then tag loads and
> stores.
>
> The encoding of "struct path" tbaa, is just a way of trading
space vs time
> in that encoding.  We trade walking time for the space of transitive
> closure, etc.
>
>
> If the provided statistic of 15% holds up, maybe which way we go in this
> trade-off space doesn't matter much?
>
Sure, that part i'm indifferent about.
>
>
>
> None of the TBAA in LLVM really has any *real* relation to the original
> type system rules, and that is deliberate.  I've argued for years that
> calling it "TBAA" just badly confuses people, and i believe here
is a good
> example :)
>
> So i don't think what you've written can be used to replace TBAA.
>
>
> So *if* we just take the proposed rules for C/C++ as the rules for the
> scheme in general, I'm not sure this is true.
>
Okay, let me try to rephrase a little more precisely:
It is almost certainly true that we *could* make *any* particular lowering
work, at different costs to the front ends and middle ends.  IE i totally
agree that the power of all of these systems (with small extensions) are
equal.  and also, for the record, i certainly think the rules/scheme
described here would almost certainly produce better optimization results
than what is currently implemented.

*But*
I don't see how you use this as a general scheme without ending up forcing
the other frontends to lower things to look as if it was a C/C++ based type
system.
This is definitely *not* true today.

They only have to lower things to look like a tree of sets.  Having been
down this path before, it's definitely easier to lower things to look like
trees of sets than like groups of C/C++ structs/bitfields/etc that follow
certain rules.

This is one of the reasons I'm honestly having a lot of trouble seeing how
this is is definitively better than what IBM's proposal was, which seemed
both more generic and incremental, and handled most concerns.

The only practical difference i see is that this just changes the scheme
from one where the language rules are mostly in the front ends to one where
the language rules are also partially in llvm.  To me that isn't
necessarily better, just different (heck, i'm even more used to that system
because it's a push closer to gcc :P)

So overall I don't see one as particularly better than the other.
You seem to.
So i'd really like to understand your perspective on this in terms of the
pro/con that you are seeing.

I mean, i'm  really not even opposed (though others may be) in principle to
saying "hey, let's just have a set of language specific tbaa passes
with
semi-common metadata".
I'd just like to understand why we would change tradeoffs, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/478f35fa/attachment-0001.html>

Ivan A. Kosarev via llvm-dev

2017-Aug-18 14:53 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

Thanks Daniel, I've read through all your recent messages carefully and 
what you say is very interesting because it sounds like you know how to 
represent everything we need to support C/C++ TBAA rules in a more 
generic way that expresses things in terms of type identifiers and byte 
offsets.

If that is correct, could you please consider this:

   struct A { int i, j; };

   union {
     struct S1 { struct A a; } s1;
     struct S2 { struct A a; } s2;
   } u;

For the definitions above, can you tell how representations of these 
accesses:

   1) u.s1.a.i
   2) u.s1.a.j  // Does not overlap with #1.

and this couple of accesses:

   3) u.s1.a.i
   4) u.s2.a.j  // Does overlap with #3.

can be differentiated in the approach you are talking about?

Thanks,

--

Ivan A. Kosarev via llvm-dev

2017-Aug-19 13:29 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

Daniel, Hal,

I'm trying to figure out what we would need to do in terms of the 
current approach, accepting its limitations, to support unions, member 
arrays, aggregate accesses and type identifiers and what I get looks 
suspiciously simple. Can you please check it out and let me know if I'm 
missing something?


For unions:

TBAA, regardless of a specific approach, cannot guarantee correct 
answers for couples of accesses that involve pointers to union members.

This means that in terms of TBAA two access paths that come through 
unions may overlap only if the most-enclosing unions in these paths are 
the same union.

Furthermore, two accesses with the same base union type overlap if their 
offset ranges overlap. The access types do not matter. Otherwise, they 
are considered not to overlap and with the current approach we can't say 
better.

This sounds like during scanning through type trees in 
TypeBasedAAResult::Aliases() we should just stop at the first union type 
we have run into and see if:
   a) that union is the base type of the other access and
   b) the offset ranges overlap.

No need to traverse through any union members.

Similarly, in MDNode::getMostGenericTBAA() we do not consider nodes that 
represent union-enclosed types. This means the most generic type may be 
an aggregate or a union, in which case we return null node unless 
aggreagte accesses are supported.

To distinct unions from other types we can have a special group for 
them. Alternatively, we can use type identification nodes, see below.


For aggregate accesses:

The comment in TypeBasedAliasAnalysis.cpp suggests:

   // TODO: We need to check if AccessType of TagA encloses AccessType of
   // TagB to support aggregate AccessType. If yes, return true.

Consider these two accesses:

     struct A { int i; };
     struct B { struct A a; } *b;

     struct X { int i; } *x;

     b->a
     x->i

Following the rule in the comment, the accesses are allowed to overlap, 
which is wrong.

Instead, we should check if the access type of one access encloses the 
base type of the other. And we should only check this if neither base 
type encloses the other.

This relies on traversing through types, but again, we shall never 
follow union-enclosed type nodes as for them to potentially overlap we 
require explicit base specification.


For member arrays:

There are no accesses to arrays, only accesses to array elements. In 
terms of TBAA, an access to an array element is no different than an 
access to an object of the element type. And an access to a member array 
element is no different than an access to a member of the element type. 
All this provided offsets remain the same.

The overall array size does not matter because structure members are 
known not to overlap and we never differentiate between union members.


For type identifiers:

We can add a language-specific node to type metadata tuples that 
guarantees uniqueness of type nodes that are not interchangeable in 
terms of TBAA.

This is supposed to resolve issues like this one:
https://bugs.llvm.org/show_bug.cgi?id=8572

Thanks,

--

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Aug 2017 - RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

Seemingly Similar Threads