thr3ads.net - llvm dev - [llvm-dev] RFC: Representing unions in TBAA [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2017-Aug-14 17:04 UTC

[llvm-dev] RFC: Representing unions in TBAA

Do you have a formal description of your approach with examples?
I have a bit of trouble visualizing exactly what your approach does.

On Mon, Aug 14, 2017 at 9:58 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello Steven, Hal and Daniel,
>
> Thanks a lot for your discussion; it really helps with summarizing current
> TBAA issues and ways to resolve them.
>
> Do you guys know anything of the current status of the proposed change?
> Steven, will you please let us know if the work is in progress and if there
> is any ETA you can share?
>
> I'm asking because we are working on an alternative approach that not
only
> supports accesses to union members, bit fields, fields of aggregate and
> union types, but also allows to represent accesses to aggregates and unions
> the same way we do it for scalars so that !tbaa.struct is replaced with
> plain !tbaa, meaning TBAA information can be propagated uniformly
> regardless of types of accessed objects. As a consequence, it supports
> identification of user types defined in different translation units, even
> if some of them are written in C and others are in C++. It also defines a
> set of language-neutral formal rules that LLVM codegen follows to determine
> whether a given pair of accesses are allowed to overlap by rules of the
> input language. As of today, we know this implementation covers all
> currently supported TBAA functionality reflected in the test suites and to
> test the new functionality we have SROA improved to preserve TBAA
> information.
>
> The point is, our approach does not try to describe accesses as (type,
> offset) pairs and instead represents access sequences explicitly beginning
> from the base type followed by field descriptors, which is what makes the
> approach so flexible. TypeBasedAAResult::Aliases() and
> MDNode::getMostGenericTBAA() are a bit more complex than they used to be
> (they actually use the same internal function), but rely exclusively on
> linear scans of access sequences unless we have a situation when have to
> check if one of the accessed types is the type of a member of the other
> one, in which case it seems we just have to traverse through fields
> recursively no matter what.
>
> So, I wonder if this or similar approaches have ever been considered
> before and what are the cons, if there are any sounded. Do you think it is
> worth to consider it now?
>
> Thanks again,
>
> --
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170814/702c99b8/attachment.html>

Ivan A. Kosarev via llvm-dev

2017-Aug-14 17:10 UTC

head link

[llvm-dev] RFC: Representing unions in TBAA

Sure, I will provide those. I just wanted to make sure this doesn't 
sound like what you know will not work for some reasons I'm not aware of.

On 14/08/17 20:04, Daniel Berlin wrote:> Do you have a formal description of your approach with examples?
> I have a bit of trouble visualizing exactly what your approach does.
>
> On Mon, Aug 14, 2017 at 9:58 AM, Ivan A. Kosarev via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hello Steven, Hal and Daniel,
>
>     Thanks a lot for your discussion; it really helps with summarizing
>     current TBAA issues and ways to resolve them.
>
>     Do you guys know anything of the current status of the proposed
>     change? Steven, will you please let us know if the work is in
>     progress and if there is any ETA you can share?
>
>     I'm asking because we are working on an alternative approach that
>     not only supports accesses to union members, bit fields, fields of
>     aggregate and union types, but also allows to represent accesses
>     to aggregates and unions the same way we do it for scalars so that
>     !tbaa.struct is replaced with plain !tbaa, meaning TBAA
>     information can be propagated uniformly regardless of types of
>     accessed objects. As a consequence, it supports identification of
>     user types defined in different translation units, even if some of
>     them are written in C and others are in C++. It also defines a set
>     of language-neutral formal rules that LLVM codegen follows to
>     determine whether a given pair of accesses are allowed to overlap
>     by rules of the input language. As of today, we know this
>     implementation covers all currently supported TBAA functionality
>     reflected in the test suites and to test the new functionality we
>     have SROA improved to preserve TBAA information.
>
>     The point is, our approach does not try to describe accesses as
>     (type, offset) pairs and instead represents access sequences
>     explicitly beginning from the base type followed by field
>     descriptors, which is what makes the approach so flexible.
>     TypeBasedAAResult::Aliases() and MDNode::getMostGenericTBAA() are
>     a bit more complex than they used to be (they actually use the
>     same internal function), but rely exclusively on linear scans of
>     access sequences unless we have a situation when have to check if
>     one of the accessed types is the type of a member of the other
>     one, in which case it seems we just have to traverse through
>     fields recursively no matter what.
>
>     So, I wonder if this or similar approaches have ever been
>     considered before and what are the cons, if there are any sounded.
>     Do you think it is worth to consider it now?
>
>     Thanks again,
>
>     -- 
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170814/bdf346b9/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-14 17:29 UTC

head link

[llvm-dev] RFC: Representing unions in TBAA

It's hard to say.
What you've described sounds close to a neutral type system implemented in
metadata.
In particular, ". It also defines a set of language-neutral formal rules
that LLVM codegen follows to determine whether a given pair of accesses are
allowed to overlap by rules of the input language. "
and "the base type followed by field descriptors"
etc

Despite the name, our current TBAA does not require or represent types.  It
represents a translation of language access rules into a language of
hierarchical sets, that are represented by a  tree with weighted edges.

If you are actually attempting to represent neutral types, certainly, the
approach can work, but probably not represent exact semantics for all
languages.

Most LLVM metadata also tries to avoid understanding the language, instead
modeling the effects.

For example, it's unlikely we'd use metadata to say "This is a
struct field
access to a, and this is one to b" and use that in analysis.  Because it
requires the semantics be at the LLVM level, and understand something about
the language.

Instead, we'd usually say "this is an access to offset 0 of memory,
with
size 4, and this is an  access to offset 4 of memory, with size 8", with
the with the semantic that accesses tagged in such a manner can only
overlap if the offset, size ranges overlap.  That semantic is language
independent.

But again, this is all very theoretical. I'd be very interested to see what
you came up with.

On Mon, Aug 14, 2017 at 10:10 AM, Ivan A. Kosarev <ivan at kosarev.info>
wrote:
> Sure, I will provide those. I just wanted to make sure this doesn't
sound
> like what you know will not work for some reasons I'm not aware of.
>
>
> On 14/08/17 20:04, Daniel Berlin wrote:
>
> Do you have a formal description of your approach with examples?
> I have a bit of trouble visualizing exactly what your approach does.
>
> On Mon, Aug 14, 2017 at 9:58 AM, Ivan A. Kosarev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello Steven, Hal and Daniel,
>>
>> Thanks a lot for your discussion; it really helps with summarizing
>> current TBAA issues and ways to resolve them.
>>
>> Do you guys know anything of the current status of the proposed change?
>> Steven, will you please let us know if the work is in progress and if
there
>> is any ETA you can share?
>>
>> I'm asking because we are working on an alternative approach that
not
>> only supports accesses to union members, bit fields, fields of
aggregate
>> and union types, but also allows to represent accesses to aggregates
and
>> unions the same way we do it for scalars so that !tbaa.struct is
replaced
>> with plain !tbaa, meaning TBAA information can be propagated uniformly
>> regardless of types of accessed objects. As a consequence, it supports
>> identification of user types defined in different translation units,
even
>> if some of them are written in C and others are in C++. It also defines
a
>> set of language-neutral formal rules that LLVM codegen follows to
determine
>> whether a given pair of accesses are allowed to overlap by rules of the
>> input language. As of today, we know this implementation covers all
>> currently supported TBAA functionality reflected in the test suites and
to
>> test the new functionality we have SROA improved to preserve TBAA
>> information.
>>
>> The point is, our approach does not try to describe accesses as (type,
>> offset) pairs and instead represents access sequences explicitly
beginning
>> from the base type followed by field descriptors, which is what makes
the
>> approach so flexible. TypeBasedAAResult::Aliases() and
>> MDNode::getMostGenericTBAA() are a bit more complex than they used to
be
>> (they actually use the same internal function), but rely exclusively on
>> linear scans of access sequences unless we have a situation when have
to
>> check if one of the accessed types is the type of a member of the other
>> one, in which case it seems we just have to traverse through fields
>> recursively no matter what.
>>
>> So, I wonder if this or similar approaches have ever been considered
>> before and what are the cons, if there are any sounded. Do you think it
is
>> worth to consider it now?
>>
>> Thanks again,
>>
>> --
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170814/390c6bf6/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Aug 2017 - RFC: Representing unions in TBAA

[llvm-dev] RFC: Representing unions in TBAA

[llvm-dev] RFC: Representing unions in TBAA

[llvm-dev] RFC: Representing unions in TBAA

Apparently Analagous Threads