thr3ads.net - llvm dev - [llvm-dev] RFC: Resolving TBAA issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2017-Aug-20 17:02 UTC

[llvm-dev] RFC: Resolving TBAA issues

>
>
>
> I do not believe the current proposal will solve all of those cases,
> particularly when the fields are the same type and structures are
> compatible but they cannot overlap in C/C++ anyway.
>
> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two
GEPsindex different fields of the same struct"

For example, given
struct {
  int arr_a[2];
  int arr_b[2];
};
assume you cannot see the original allocation site.
in llvm ir gep(arr_b,  -1) is legally an access to arr_a[1].
You can use -1 even though it's going to be a pointer to [2 x i32].
Thus, you can't even tell that gep(arr_a, 0) and  gep(arr_b, -1) do not
overlap without being able to know *something* about the layout of fields
in the structure you are talking about.

I'd start with: It should not require tbaa to determine that loads from
geps that arr_a and arr_b cannot overlap.   It is true regardless of the
types involved.

In terms of "who cares", Google definitely compiles with
-fno-strict-aliasing (because third party packages are still not clean
enough), and last i looked, Apple did the same (but i admittedly have not
kept up).

GCC can definitely disambiguate field accesses (through points-to and
otherwise) better than LLVM in a situation where strict aliasing is off.

As an aside, i also can't build a sane field-sensitive points-to on our
current type system, because the types and structures are already
meaningless  (and we are busy making it weaker, too).
I don't think we are going to want to tie field-sensitive points-to to TBAA
(you definitely want to be able to run the former without the latter), but
right now that is the only metadata you can use.

Finally, the merging of TBAA is definitely going to be more conservative
than the merging of field offset info: If we merge a load of an int and a
float, we will, IIRC, go to the nearest common ancestor in TBAA.   The
field offset info may actually still be identical between the two, but we
will lose it by creating/or going to the common ancestor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/f8e5c0dd/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-20 17:11 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

>
>
>
> Finally, the merging of TBAA is definitely going to be more conservative
> than the merging of field offset info: If we merge a load of an int and a
> float, we will, IIRC, go to the nearest common ancestor in TBAA.   The
> field offset info may actually still be identical between the two, but we
> will lose it by creating/or going to the common ancestor.
>
>Imagine
int - offset 4
float - offset 4
int - offset 12

merge(first int field, float) =mergeintfloat -no offset info

You can no longer disambiguate this against second int field, even though
it can't possibly overlap, not for type reasons, but for offset reasons.

>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/086cb942/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-20 18:22 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On 08/20/2017 12:02 PM, Daniel Berlin wrote:>
>
>
>     I do not believe the current proposal will solve all of those
>     cases, particularly when the fields are the same type and
>     structures are compatible but they cannot overlap in C/C++ anyway.
>
> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two 
> GEPs index different fields of the same struct"
>
> For example, given
> struct {
>   int arr_a[2];
>   int arr_b[2];
> };
> assume you cannot see the original allocation site.
> in llvm ir gep(arr_b,  -1) is legally an access to arr_a[1].
> You can use -1 even though it's going to be a pointer to [2 x i32].
> Thus, you can't even tell that gep(arr_a, 0) and  gep(arr_b, -1) do 
> not overlap without being able to know *something* about the layout of 
> fields in the structure you are talking about.
>
> I'd start with: It should not require tbaa to determine that loads 
> from geps that arr_a and arr_b cannot overlap.   It is true regardless 
> of the types involved.
>
> In terms of "who cares", Google definitely compiles with 
> -fno-strict-aliasing (because third party packages are still not clean 
> enough), and last i looked, Apple did the same (but i admittedly have 
> not kept up).
We definitely also have code that we compile that way as well. As it 
turns out, this is my motivation for developing the type sanitizer (so 
we have some tool that users can employ to clean up this kind of code). 
Patches have been posted for review.
>
> GCC can definitely disambiguate field accesses (through points-to and 
> otherwise) better than LLVM in a situation where strict aliasing is off.
>
> As an aside, i also can't build a sane field-sensitive points-to on 
> our current type system, because the types and structures are already 
> meaningless  (and we are busy making it weaker, too).
> I don't think we are going to want to tie field-sensitive points-to to 
> TBAA (you definitely want to be able to run the former without the 
> latter), but right now that is the only metadata you can use.
This also brings up a good point. Even if we use the same metadata for 
both type and field analysis, I don't see why we can't disable the type 
portions without disabling the field analysis (essentially by emitting 
everything as one universally-aliasing type). Maybe we should do that 
for -fno-strict-aliasing?

Thanks again,
Hal
>
> Finally, the merging of TBAA is definitely going to be more 
> conservative than the merging of field offset info: If we merge a load 
> of an int and a float, we will, IIRC, go to the nearest common 
> ancestor in TBAA.   The field offset info may actually still be 
> identical between the two, but we will lose it by creating/or going to 
> the common ancestor.
>
>
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/b7de83dd/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-20 18:30 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On 08/20/2017 12:11 PM, Daniel Berlin wrote:>
>
>
>     Finally, the merging of TBAA is definitely going to be more
>     conservative than the merging of field offset info: If we merge a
>     load of an int and a float, we will, IIRC, go to the nearest
>     common ancestor in TBAA.   The field offset info may actually
>     still be identical between the two, but we will lose it by
>     creating/or going to the common ancestor.
>
>
> Imagine
> int - offset 4
> float - offset 4
> int - offset 12
>
> merge(first int field, float) => mergeintfloat -no offset info
> You can no longer disambiguate this against second int field, even 
> though it can't possibly overlap, not for type reasons, but for offset 
> reasons.
Okay, but I don't see why we have to do that. Could we not do

merge(first int field, float) == mergeintfloat @ offset 4

(where mergeintfloat is probably char or similar)

where we keep matching offsets? Or we encode some kind of disjunction 
directly (which certainly seems reasonable to me for access merging).

Thanks again,
Hal
>
>
>
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/08755ac0/attachment.html>

Daniel Berlin via llvm-dev

2017-Aug-20 18:36 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On Sun, Aug 20, 2017 at 11:22 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 08/20/2017 12:02 PM, Daniel Berlin wrote:
>
>
>>
>> I do not believe the current proposal will solve all of those cases,
>> particularly when the fields are the same type and structures are
>> compatible but they cannot overlap in C/C++ anyway.
>>
>> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two
GEPs
> index different fields of the same struct"
>
> For example, given
> struct {
>   int arr_a[2];
>   int arr_b[2];
> };
> assume you cannot see the original allocation site.
> in llvm ir gep(arr_b,  -1) is legally an access to arr_a[1].
> You can use -1 even though it's going to be a pointer to [2 x i32].
> Thus, you can't even tell that gep(arr_a, 0) and  gep(arr_b, -1) do not
> overlap without being able to know *something* about the layout of fields
> in the structure you are talking about.
>
> I'd start with: It should not require tbaa to determine that loads from
> geps that arr_a and arr_b cannot overlap.   It is true regardless of the
> types involved.
>
> In terms of "who cares", Google definitely compiles with
> -fno-strict-aliasing (because third party packages are still not clean
> enough), and last i looked, Apple did the same (but i admittedly have not
> kept up).
>
>
> We definitely also have code that we compile that way as well. As it turns
> out, this is my motivation for developing the type sanitizer (so we have
> some tool that users can employ to clean up this kind of code). Patches
> have been posted for review.
>
>(and we're looking into using it to do just that :P)

>
> GCC can definitely disambiguate field accesses (through points-to and
> otherwise) better than LLVM in a situation where strict aliasing is off.
>
> As an aside, i also can't build a sane field-sensitive points-to on our
> current type system, because the types and structures are already
> meaningless  (and we are busy making it weaker, too).
> I don't think we are going to want to tie field-sensitive points-to to
> TBAA (you definitely want to be able to run the former without the latter),
> but right now that is the only metadata you can use.
>
>
> This also brings up a good point. Even if we use the same metadata for
> both type and field analysis, I don't see why we can't disable the
type
> portions without disabling the field analysis (essentially by emitting
> everything as one universally-aliasing type). Maybe we should do that for
> -fno-strict-aliasing?
>
That actually sounds very reasonable to me, if we can make it work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/2ad670ee/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-20 19:10 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On 08/20/2017 12:02 PM, Daniel Berlin wrote:>
>
>
>     I do not believe the current proposal will solve all of those
>     cases, particularly when the fields are the same type and
>     structures are compatible but they cannot overlap in C/C++ anyway.
>
> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two 
> GEPs index different fields of the same struct"
>
> For example, given
> struct {
>   int arr_a[2];
>   int arr_b[2];
> };
> assume you cannot see the original allocation site.
> in llvm ir gep(arr_b,  -1) is legally an access to arr_a[1].
> You can use -1 even though it's going to be a pointer to [2 x i32].
> Thus, you can't even tell that gep(arr_a, 0) and  gep(arr_b, -1) do 
> not overlap without being able to know *something* about the layout of 
> fields in the structure you are talking about.
Agreed (and this certainly does motivate keeping both size and offset 
information for the fields). The other thing that I think it's important 
to do in this respect is to record whether or not it's legal to do this 
kind of inter-field indexing. In C, I believe you can always legally do 
this. In C++, it is always true for standard-layout types, but 
otherwise, it is up to the implementation (i.e., to whatever the 
implementation allows the application of the offsetof macro). In saying 
this, I'm strengthening the wording in the standard in the following 
sense: The C++ rules for pointer arithmetic and safely-derived pointer 
values, at least for implementations with strict pointer safety, 
disallow this kind of inter-field addressing, except perhaps in the case 
of two adjacent variables in standard-layout classes, for everything. 
However, it's also clear that whenever you can apply the offsetof macro 
all of the relative offsets are part of the semantic model of the 
abstract machine, and due to practical considerations if nothing else, I 
suspect we can't reasonably restrict this behavior for standard-layout 
classes.

Thanks again,
Hal
>
> I'd start with: It should not require tbaa to determine that loads 
> from geps that arr_a and arr_b cannot overlap.   It is true regardless 
> of the types involved.
>
> In terms of "who cares", Google definitely compiles with 
> -fno-strict-aliasing (because third party packages are still not clean 
> enough), and last i looked, Apple did the same (but i admittedly have 
> not kept up).
>
> GCC can definitely disambiguate field accesses (through points-to and 
> otherwise) better than LLVM in a situation where strict aliasing is off.
>
> As an aside, i also can't build a sane field-sensitive points-to on 
> our current type system, because the types and structures are already 
> meaningless  (and we are busy making it weaker, too).
> I don't think we are going to want to tie field-sensitive points-to to 
> TBAA (you definitely want to be able to run the former without the 
> latter), but right now that is the only metadata you can use.
>
> Finally, the merging of TBAA is definitely going to be more 
> conservative than the merging of field offset info: If we merge a load 
> of an int and a float, we will, IIRC, go to the nearest common 
> ancestor in TBAA.   The field offset info may actually still be 
> identical between the two, but we will lose it by creating/or going to 
> the common ancestor.
>
>
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/74aa2673/attachment.html>

llvm dev - Aug 2017 - RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues