thr3ads.net - llvm dev - [llvm-dev] RFC: Resolving TBAA issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Aug-20 16:27 UTC

[llvm-dev] RFC: Resolving TBAA issues

On 08/20/2017 11:22 AM, Daniel Berlin via llvm-dev
wrote:> Sorry, hit send early.
>
>
> On Sun, Aug 20, 2017 at 9:16 AM, Daniel Berlin <dberlin at dberlin.org 
> <mailto:dberlin at dberlin.org>> wrote:
>
>
>
>     On Sun, Aug 20, 2017 at 8:54 AM, Ivan A. Kosarev via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>         Hello Daniel,
>
>         > The problem with the way you are trying to show this is that
>         > there are many ways to prove no-alias, and TBAA is one of
them.
>         > The reason i stare at dump files and debug info is precisely
to
>         > separate the TBAA portion from the rest.
>
>         Makes sense to me. However, for a translation unit like this:
>
>           struct BUF1 { ... };
>           struct BUF2 { ... };
>
>           int foo(int n, struct BUF1* p, struct BUF2* q) {
>              for (int i = 0; i < n; i++)
>                  p->b1 += q->b2;
>              return 0;
>           }
>
>         I think we can be sure there are no ways for the compiler to
>         know that these two accesses do not overlap, except TBAA.
>
>
>     This is definitely false in general.
>     Again, speaking about GCC, the logic for whether fields can be
>     accessed is separate from the logic about whether TBAA says fields
>     can be accessed.
>     In some cases the flags to control the logic are both controlled
>     by fstrict-aliasing, but are unrelated to tbaa.
>
Our current TBAA combines these two things (field-offset-based 
determinations and strictly-type-based rules) into what we call TBAA. 
This proposal does likewise. Are there advantages to splitting them that 
we should consider?

Thanks again,
Hal
>
>     Even if you have tried to place the fields at the same offset, as
>     you have, whether it can disambiguate the accesses can depend on
>     more than just TBAA, including alignment rules, etc.
>
>
> You definitely may be able to come up with examples where only tbaa 
> *should* be active, but i don't think it's really a safe way to go 
> about testing assumptions about TBAA.
> For example, it also assumes no bugs in the other methods of analysis, 
> which is defintitely not a safe assumption :)
>
> If you only care about the *end result* (IE whether it's allow to say 
> the accesses overlap) it is generally going to be okay, but again, 
> this assumes no bug in any implementation
>
> If you want to test tbaa specific things for real, you'd have to print 
> the tbaa trees and results as gcc sees them, for example.
> That's really the only way to be sure.
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/a55b48dc/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-20 16:31 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On Sun, Aug 20, 2017 at 9:27 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 08/20/2017 11:22 AM, Daniel Berlin via llvm-dev wrote:
>
> Sorry, hit send early.
>
>
> On Sun, Aug 20, 2017 at 9:16 AM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>>
>>
>> On Sun, Aug 20, 2017 at 8:54 AM, Ivan A. Kosarev via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello Daniel,
>>>
>>> > The problem with the way you are trying to show this is that
>>> > there are many ways to prove no-alias, and TBAA is one of
them.
>>> > The reason i stare at dump files and debug info is precisely
to
>>> > separate the TBAA portion from the rest.
>>>
>>> Makes sense to me. However, for a translation unit like this:
>>>
>>>   struct BUF1 { ... };
>>>   struct BUF2 { ... };
>>>
>>>   int foo(int n, struct BUF1* p, struct BUF2* q) {
>>>      for (int i = 0; i < n; i++)
>>>          p->b1 += q->b2;
>>>      return 0;
>>>   }
>>>
>>> I think we can be sure there are no ways for the compiler to know
that
>>> these two accesses do not overlap, except TBAA.
>>
>>
>> This is definitely false in general.
>> Again, speaking about GCC, the logic for whether fields can be accessed
>> is separate from the logic about whether TBAA says fields can be
accessed.
>> In some cases the flags to control the logic are both controlled by
>> fstrict-aliasing, but are unrelated to tbaa.
>>
>
> Our current TBAA combines these two things (field-offset-based
> determinations and strictly-type-based rules) into what we call TBAA. This
> proposal does likewise. Are there advantages to splitting them that we
> should consider?
>
>Yes.
GEP has no relation to original field accesses, as you know (IE we allow
them to access negative offsets, etc)
For a lot of these languages, more than the TBAA rules say that you can't
just  go marching through structures, etc.

We also have cases TBAA cannot disambiguate but C/C++ says the fields can't
overlap. We lack the metadata to correctly say they cannot, because it
cannot be inferred from geps.
(if you search for threads with taewook a while back, you will find some
patches and discussion).


I do not believe the current proposal will solve all of those cases,
particularly when the fields are the same type and structures are
compatible but they cannot overlap in C/C++ anyway.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/ba4525b0/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-20 17:02 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

>
>
>
> I do not believe the current proposal will solve all of those cases,
> particularly when the fields are the same type and structures are
> compatible but they cannot overlap in C/C++ anyway.
>
> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two
GEPsindex different fields of the same struct"

For example, given
struct {
  int arr_a[2];
  int arr_b[2];
};
assume you cannot see the original allocation site.
in llvm ir gep(arr_b,  -1) is legally an access to arr_a[1].
You can use -1 even though it's going to be a pointer to [2 x i32].
Thus, you can't even tell that gep(arr_a, 0) and  gep(arr_b, -1) do not
overlap without being able to know *something* about the layout of fields
in the structure you are talking about.

I'd start with: It should not require tbaa to determine that loads from
geps that arr_a and arr_b cannot overlap.   It is true regardless of the
types involved.

In terms of "who cares", Google definitely compiles with
-fno-strict-aliasing (because third party packages are still not clean
enough), and last i looked, Apple did the same (but i admittedly have not
kept up).

GCC can definitely disambiguate field accesses (through points-to and
otherwise) better than LLVM in a situation where strict aliasing is off.

As an aside, i also can't build a sane field-sensitive points-to on our
current type system, because the types and structures are already
meaningless  (and we are busy making it weaker, too).
I don't think we are going to want to tie field-sensitive points-to to TBAA
(you definitely want to be able to run the former without the latter), but
right now that is the only metadata you can use.

Finally, the merging of TBAA is definitely going to be more conservative
than the merging of field offset info: If we merge a load of an int and a
float, we will, IIRC, go to the nearest common ancestor in TBAA.   The
field offset info may actually still be identical between the two, but we
will lose it by creating/or going to the common ancestor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/f8e5c0dd/attachment.html>

Ivan A. Kosarev via llvm-dev

2017-Aug-23 14:26 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

Daniel,

 > GEP has no relation to original field accesses, as you know (IE
 > we allow them to access negative offsets, etc)
 > For a lot of these languages, more than the TBAA rules say that
 > you can't just  go marching through structures, etc.

So with the current approach we mix two different things: alias rules 
for types and information about specific accesses, such as offsets. What 
this means is, whatever we can conclude from considering a couple of 
accesses represented with such a mix, it can never extend beyond the 
scope of what Clang treats as a single access, that is, an expression of 
the form 'p->a.b.c'. Same expression split into parts, e.g., 'p2
=
&p->a.b; p2->c', results in a less specific description of the
access
and, as a consequence, in a greater number of potential false positives. 
In turn, proving that 'p2' relates to 'p' is up to analyses that
deal
with memory locations and not memory accesses. Looks like long-term the 
current approach drives us nowhere.

If I take it correctly, purifying TBAA information from offsets means we 
end up with a sort of alias sets. Then, offsets go to another metadata 
tag that encode accesses in terms of constraint expressions. These tags 
are supposed to be processed with what eventually should become an 
implementation of the field-sensitive points-to analysis. This would 
also resolve the BasicAA vs. TBAA responses issue.

I wonder if !tbaa tags for loads and stores reworked to refer to both 
alias sets and constraint expressions would work as a transient format 
for groping our way toward full-size field-sensitive.

Thanks,

--

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Aug 2017 - RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

Apparently Analagous Threads