thr3ads.net - llvm dev - [llvm-dev] Field sensitive alias analysis? [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Dmitry Polukhin via llvm-dev

2015-Dec-10 09:03 UTC

[llvm-dev] Field sensitive alias analysis?

Please see inline.

struct S {>>   int a[10];
>>   int b;
>> };
>>
>> int foo(struct S *ps, int i) {
>>   ps->a[i] = 1;
>>   ps->b = 2;
>>   return ps->a[0];
>> }
>>
>> define i32 @foo(%struct.S* nocapture %ps, i32 %i) #0 {
>> entry:
>>   %idxprom = sext i32 %i to i64
>>   %arrayidx = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
>> i32 0, i64 %idxprom
>>   store i32 1, i32* %arrayidx, align 4, !tbaa !1
>>   %b = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32 1
>>   store i32 2, i32* %b, align 4, !tbaa !5
>>   %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
>> i32 0, i64 0
>>   %0 = load i32, i32* %arrayidx2, align 4, !tbaa !1
>>
>
> I'm not entirely sure why TBAA is necessary to disambiguate ps->a
from
> ps->b, it looks like basicaa should already be able to say they
don't
> overlap.
> Does this not happen?
>
Opps, you are right in my example basicaaa could do it potentially. Correct
example is slightly different:
int foo(struct S *ps, int i) {
  ps->a[i] = 1;
  ps->b = 2;
  return ps->a[i];
}
Here basicaa cannot make sure that 'ps->a[i]' doesn't change
after 'ps->b 2' because if 'i == 10' all 3 memory accesses
will read/write the same
memory. And type information about S::a is required to disambiguate. With
current TBAA 'ps->a[i]' is about random 'int' read.

> Missing information here is the range inside struct S that could be
>> accessed.
>>
>
> What do you mean by "could be accessed".  Do you mean "valid
to access in
> C"?
>
By access I meant read/write memory i.e. that size of S::a inside the
struct or at least information that only S::a is accessed in this place
i.e. not S::b.

>
>> Also as you can see array member of struct in TBAA is presented as
>> omnipotent char not as an array of int.
>>
>
> Agreed.
>
>
>
>> Arrays in struct in TBAA can be represented something like this:
>> !6 = !{!"S", !7, i64 0, !2, i64 40}
>> !7 = !{!"<unique id of int[10]>", !2, i64 0}
>>
>> And 'ps->a[i]' could have TBAA like this:
>> !8 = !{!6, !7, i64 0}
>>
>
>
> Yes. This should likely work. Note that size, while nice, is harder.
>
Yes, knowledge of size is very good thing but it seems that we can do more
even without size. Just using path aware TBAA as we do today but enable it
for arrays.

> One thing that is sadly still common (at least in C) is to do this:
>
> struct S {
>   int b;
>   int a[0]; // or 1
> };
>
> and malloc it at (sizeof S + 40 * sizeof (int)), then write into a[1...39].
>
>
> If we want to break that, it is likely a lot of stuff gets broken (at one
> point when we did it in gcc, we broke 80% of all the packages in a given
> linux distro ....)
>
I absolutely agree that we cannot break this. We only can assume that S::b
is not accessed via S::s with negative index. As far as I know it shouldn't
break good programs.

As far as I can see if struct is enclosed in another struct,
information>> about inner struct get lost only offset present. But I think for arrays
it
>> is better to keep array type in TBAA for the struct and element
accesses.
>>
>
> Don't get me wrong, i think that it would be nice to have offset and
size,
> and gcc does indeed track this info on it's own.
>
> I'm just trying to understand where you think it will provide better
info.
>
> Because once you get into cases like:
>
> struct S {
>   int a[10];
>   int b;
> };
>
> int foo(struct S *ps, int *i) {
>   ps->a[i] = 1;
>   *i = 3;
>   return ps->b;
> }
>
> You have no guarantee, for example, that *i and *(ps->b) are not the
same
> memory.
>
Yes, in this example pointer 'i' can point to S:b or S::a so we cannot
disambiguate it even with sizes and better TBAA. We need restrict somehere
here or information from callgraph to something but it is out of scope TBAA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/be9bcea2/attachment.html>

Daniel Berlin via llvm-dev

2015-Dec-10 16:09 UTC

head link

[llvm-dev] Field sensitive alias analysis?

>
>
>> Opps, you are right in my example basicaaa could do it potentially.
> Correct example is slightly different:
> int foo(struct S *ps, int i) {
>   ps->a[i] = 1;
>   ps->b = 2;
>   return ps->a[i];
> }
> Here basicaa cannot make sure that 'ps->a[i]' doesn't change
after 'ps->b
> = 2' because if 'i == 10' all 3 memory accesses will read/write
the same
> memory. And type information about S::a is required to disambiguate. With
> current TBAA 'ps->a[i]' is about random 'int' read.
>
Yes, and without more info, in LLVM that read can legally touch ps->b.
So that makes sense.

>
>
>> Missing information here is the range inside struct S that could be
>>> accessed.
>>>
>>
>> What do you mean by "could be accessed".  Do you mean
"valid to access in
>> C"?
>>
>
> By access I meant read/write memory i.e. that size of S::a inside the
> struct or at least information that only S::a is accessed in this place
> i.e. not S::b.
>
Okay.

So what you want sounds reasonable to me ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/ce6af2f2/attachment.html>

Pete Cooper via llvm-dev

2015-Dec-10 19:00 UTC

head link

[llvm-dev] Field sensitive alias analysis?

> On Dec 10, 2015, at 8:09 AM, Daniel Berlin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> Opps, you are right in my example basicaaa could do it potentially. Correct
example is slightly different:
> int foo(struct S *ps, int i) {
>   ps->a[i] = 1;
>   ps->b = 2;
>   return ps->a[i];
> }
> Here basicaa cannot make sure that 'ps->a[i]' doesn't change
after 'ps->b = 2' because if 'i == 10' all 3 memory accesses
will read/write the same memory. And type information about S::a is required to
disambiguate. With current TBAA 'ps->a[i]' is about random
'int' read.
> 
> Yes, and without more info, in LLVM that read can legally touch ps->b.I thought the inbounds on the GEPs would have told us that the a[] accesses and
b access are both in bounds of their respective fields of the struct.

Or does inbounds only tell us that the GEPs are in bounds of ‘ps’
itself?> So that makes sense.
>  
>  
> Missing information here is the range inside struct S that could be
accessed.
> 
> What do you mean by "could be accessed".  Do you mean "valid
to access in C"?
> 
> By access I meant read/write memory i.e. that size of S::a inside the
struct or at least information that only S::a is accessed in this place i.e. not
S::b.
> 
> Okay.
> 
> So what you want sounds reasonable to me ;-)
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/efa9cfa8/attachment-0001.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Dec 2015 - Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

Apparently Analagous Threads