thr3ads.net - llvm dev - [llvm-dev] Field sensitive alias analysis? [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Dmitry Polukhin via llvm-dev

2015-Dec-09 10:33 UTC

[llvm-dev] Field sensitive alias analysis?

Hi Daniel,

I see your point about LLVM and C/C++ type agnostic. I think TBAA was
invented to partially cover this gap and give optimization opportunities
when LLVM types are not sufficient but C/C++ types have required
information.

What do you think about following example:
struct S {
  int a[10];
  int b;
};

int foo(struct S *ps, int i) {
  ps->a[i] = 1;
  ps->b = 2;
  return ps->a[0];
}

define i32 @foo(%struct.S* nocapture %ps, i32 %i) #0 {
entry:
  %idxprom = sext i32 %i to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32
0, i64 %idxprom
  store i32 1, i32* %arrayidx, align 4, !tbaa !1
  %b = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32 1
  store i32 2, i32* %b, align 4, !tbaa !5
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32
0, i64 0
  %0 = load i32, i32* %arrayidx2, align 4, !tbaa !1
  ret i32 %0
}

!1 = !{!2, !2, i64 0}
!2 = !{!"int", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
!5 = !{!6, !2, i64 40}
!6 = !{!"S", !3, i64 0, !2, i64 40}

Missing information here is the range inside struct S that could be
accessed. Also as you can see array member of struct in TBAA is presented
as omnipotent char not as an array of int.

Arrays in struct in TBAA can be represented something like this:
!6 = !{!"S", !7, i64 0, !2, i64 40}
!7 = !{!"<unique id of int[10]>", !2, i64 0}

And 'ps->a[i]' could have TBAA like this:
!8 = !{!6, !7, i64 0}

As far as I can see if struct is enclosed in another struct, information
about inner struct get lost only offset present. But I think for arrays it
is better to keep array type in TBAA for the struct and element accesses.

    Dmitry

On Tue, Dec 8, 2015 at 6:44 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> Just remember that to LLVM, TBAA info doesn't tell it about types.
>
> So when you say " Differentiating array subscripts also seems like out
of
> the scope of TBAA but adding type information that this memory assess
> accessing array not a random object of given type could be useful."
>
> Remember that LLVM has no notion of C/C++ types.
>
> It doesn't know about them, it doesn't care about them.
>
> What TBAA is telling it is that certain memory locations are disjoint.
> It will *never* understand that what is being accessed is "a C++
struct
> containing an array of shorts", because the type system doesn't
know what a
> "C struct" is, or a "C array" or a  "C
short".
>
> Thus, while helpful, TBAA info in LLVM tells it *nothing* about the types,
> it only tells it that "if i am accessing this offset, it is disjoint
with
> accessing this other offset".
>
>
> On Tue, Dec 8, 2015 at 3:38 AM, Dmitry Polukhin <dmitry.polukhin at
gmail.com
> > wrote:
>
>> Jeroen, thank you for very useful link with the context. Indeed union
>> cases are very complicated and I see places in code when TBAA gives up.
>>
>> Daniel, I completely agree that TBAA has limited power and can solve
>> relatively simple cases only. So anything more complicated that
involves
>> intermediate variables that points to struct or array elements cannot
be
>> solved by TBAA alone. Differentiating array subscripts also seems like
out
>> of the scope of TBAA but adding type information that this memory
assess
>> accessing array not a random object of given type could be useful.
>> Therefore TBAA can compliment points-to analysis and sometimes help in
>> cases when points-to may not have enough information.
>>
>> On Mon, Dec 7, 2015 at 7:43 PM, Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>> TBAA is about types, not accesses.
>>> TBAA "struct path data" is about access paths and types
that are the end
>>> of access paths, for the most part.
>>>
>>> It has no notion of access size, etc, only offset.
>>>
>>> It is possible to extend it and handle *very basic* cases in the
>>> frontend (IE structs not containing unions anywhere, with constant
>>> accesses, etc)
>>> But it would degrade quite quickly (you would likely need a sane
way to
>>> say "this is an access to an unknown offset into this
type")
>>>
>>> Generally, rather than just try to produce metadata in the
frontend,
>>> most compilers perform field-sensitive points-to or something
similar for
>>> fields, and then rely on data dependence for differentiating array
>>> subscripts.
>>>
>>> (This is what GCC does, LLVM has CFL-AA, but it's not field
sensitive
>>> yet)
>>>
>>> So handling .size vs .a[] is probably possible in the frontend.
>>>
>>> Doing array subscript analysis in general, probably not something
you
>>> want in the frontend.
>>> Handling tricky cases of what pointers to structs point to,
probably the
>>> domain of field-sensitive points-to
>>>
>>> On Mon, Dec 7, 2015 at 7:13 AM, Dmitry Polukhin via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> BTW, I have found why it doesn't work for arrays. TBAA
information
>>>> propagation is not implemented in
CodeGenFunction::EmitArraySubscriptExpr
>>>> with "TODO: Preserve/extend path TBAA metadata?".
>>>>
>>>> On Fri, Dec 4, 2015 at 1:38 PM, Dmitry Polukhin <
>>>> dmitry.polukhin at gmail.com> wrote:
>>>>
>>>>> As far as I can see it is specifics of arrays inside
structs. Current
>>>>> TBAA does distinguish non-array members with path sensitive
TBAA (see !1
>>>>> and !6 in my example below, TBAA has reference to struct !2
and offset). As
>>>>> for arrays information that it was member of some struct
get lost
>>>>> completely (!7 has nothing about struct !2).
>>>>>
>>>>> struct S {
>>>>>   int a;
>>>>>   int b;
>>>>>   int c[3];
>>>>> };
>>>>>
>>>>> void foo(struct S* p) {
>>>>>   p->a = 1;
>>>>>   p->b = 2;
>>>>>   p->c[0] = 3;
>>>>>   p->c[1] = 4;
>>>>> }
>>>>>
>>>>> define void @foo(%struct.S* nocapture %p) #0 {
>>>>> entry:
>>>>>   %a = getelementptr inbounds %struct.S, %struct.S* %p, i64
0, i32 0
>>>>>   store i32 1, i32* %a, align 4, !tbaa !1
>>>>>   %b = getelementptr inbounds %struct.S, %struct.S* %p, i64
0, i32 1
>>>>>   store i32 2, i32* %b, align 4, !tbaa !6
>>>>>   %arrayidx = getelementptr inbounds %struct.S, %struct.S*
%p, i64 0,
>>>>> i32 2, i64 0
>>>>>   store i32 3, i32* %arrayidx, align 4, !tbaa !7
>>>>>   %arrayidx2 = getelementptr inbounds %struct.S, %struct.S*
%p, i64 0,
>>>>> i32 2, i64 1
>>>>>   store i32 4, i32* %arrayidx2, align 4, !tbaa !7
>>>>>   ret void
>>>>> }
>>>>>
>>>>> !0 = !{!"clang version 3.8.0 "}
>>>>> !1 = !{!2, !3, i64 0}
>>>>> !2 = !{!"S", !3, i64 0, !3, i64 4, !4, i64 8}
>>>>> !3 = !{!"int", !4, i64 0}
>>>>> !4 = !{!"omnipotent char", !5, i64 0}
>>>>> !5 = !{!"Simple C/C++ TBAA"}
>>>>> !6 = !{!2, !3, i64 4}
>>>>> !7 = !{!3, !3, i64 0}
>>>>>
>>>>> I'm just start learning how TBAA in clang works so I
don't know why it
>>>>> was implemented this way.
>>>>>
>>>>> On Fri, Dec 4, 2015 at 11:06 AM, Vaivaswatha Nagaraj via
llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to optimize a simple C code and came
across a situation
>>>>>> where invariant code is not being moved out:
>>>>>>
>>>>>> On an -O3 compilation, I noticed that the
"load" for the loop bounds
>>>>>> (which remain invariant throughout) happens on each
iteration of both the
>>>>>> loops, even though it is not modified anywhere in the
function "bigLoop".
>>>>>> It seems that alias analysis is not able to say that
the writes to one
>>>>>> field in the structure does not impact the other field,
leading to LICM
>>>>>> being ineffective.
>>>>>>
>>>>>> Do any of the alias analyses currently have some kind
of field
>>>>>> sensitivity that can help in this case?
>>>>>>
>>>>>> ------------------------- test case
>>>>>> ------------------------------------
>>>>>> #include <stdlib.h>
>>>>>> #include <stdio.h>
>>>>>>
>>>>>> #define SIZE 100
>>>>>>
>>>>>> struct AS {
>>>>>>   int a[SIZE+4];
>>>>>>   int size;
>>>>>> } A;
>>>>>>
>>>>>> void bigLoop(void)
>>>>>> {
>>>>>>   unsigned i, j;
>>>>>>
>>>>>>   for (i = 0; i < A.size; i++) {
>>>>>>     A.a[i+2] +=  A.a[i];
>>>>>>   }
>>>>>>   for (i = 0; i < A.size; i++) {
>>>>>>     A.a[i+2] *=  A.a[i];
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   A.size = random()%SIZE;
>>>>>>   for (unsigned i = 0; i < A.size; i++) {
>>>>>>     A.a[i] = random()%23;
>>>>>>   }
>>>>>>   bigLoop();
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>   - Vaivaswatha
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/d71b3e2e/attachment.html>

Daniel Berlin via llvm-dev

2015-Dec-09 16:39 UTC

head link

[llvm-dev] Field sensitive alias analysis?

On Wed, Dec 9, 2015 at 2:33 AM, Dmitry Polukhin <dmitry.polukhin at
gmail.com>
wrote:
> Hi Daniel,
>
> I see your point about LLVM and C/C++ type agnostic. I think TBAA was
> invented to partially cover this gap and give optimization opportunities
> when LLVM types are not sufficient but C/C++ types have required
> information.
>
> What do you think about following example:
> struct S {
>   int a[10];
>   int b;
> };
>
> int foo(struct S *ps, int i) {
>   ps->a[i] = 1;
>   ps->b = 2;
>   return ps->a[0];
> }
>
> define i32 @foo(%struct.S* nocapture %ps, i32 %i) #0 {
> entry:
>   %idxprom = sext i32 %i to i64
>   %arrayidx = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32
> 0, i64 %idxprom
>   store i32 1, i32* %arrayidx, align 4, !tbaa !1
>   %b = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32 1
>   store i32 2, i32* %b, align 4, !tbaa !5
>   %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
> i32 0, i64 0
>   %0 = load i32, i32* %arrayidx2, align 4, !tbaa !1
>
I'm not entirely sure why TBAA is necessary to disambiguate ps->a from
ps->b, it looks like basicaa should already be able to say they don't
overlap.
Does this not happen?


>   ret i32 %0
> }
>
> !1 = !{!2, !2, i64 0}
> !2 = !{!"int", !3, i64 0}
> !3 = !{!"omnipotent char", !4, i64 0}
> !4 = !{!"Simple C/C++ TBAA"}
> !5 = !{!6, !2, i64 40}
> !6 = !{!"S", !3, i64 0, !2, i64 40}
>
> Missing information here is the range inside struct S that could be
> accessed.
>
What do you mean by "could be accessed".  Do you mean "valid to
access in
C"?

> Also as you can see array member of struct in TBAA is presented as
> omnipotent char not as an array of int.
>
Agreed.


> Arrays in struct in TBAA can be represented something like this:
> !6 = !{!"S", !7, i64 0, !2, i64 40}
> !7 = !{!"<unique id of int[10]>", !2, i64 0}
>
> And 'ps->a[i]' could have TBAA like this:
> !8 = !{!6, !7, i64 0}
>

Yes. This should likely work. Note that size, while nice, is harder.

One thing that is sadly still common (at least in C) is to do this:

struct S {
  int b;
  int a[0]; // or 1
};

and malloc it at (sizeof S + 40 * sizeof (int)), then write into a[1...39].


If we want to break that, it is likely a lot of stuff gets broken (at one
point when we did it in gcc, we broke 80% of all the packages in a given
linux distro ....)
>
> As far as I can see if struct is enclosed in another struct, information
> about inner struct get lost only offset present. But I think for arrays it
> is better to keep array type in TBAA for the struct and element accesses.
>
Don't get me wrong, i think that it would be nice to have offset and size,
and gcc does indeed track this info on it's own.

I'm just trying to understand where you think it will provide better info.

Because once you get into cases like:

struct S {
  int a[10];
  int b;
};

int foo(struct S *ps, int *i) {
  ps->a[i] = 1;
  *i = 3;
  return ps->b;
}


You have no guarantee, for example, that *i and *(ps->b) are not the same
memory.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/b90da4eb/attachment.html>

Dmitry Polukhin via llvm-dev

2015-Dec-10 09:03 UTC

head link

[llvm-dev] Field sensitive alias analysis?

Please see inline.

struct S {>>   int a[10];
>>   int b;
>> };
>>
>> int foo(struct S *ps, int i) {
>>   ps->a[i] = 1;
>>   ps->b = 2;
>>   return ps->a[0];
>> }
>>
>> define i32 @foo(%struct.S* nocapture %ps, i32 %i) #0 {
>> entry:
>>   %idxprom = sext i32 %i to i64
>>   %arrayidx = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
>> i32 0, i64 %idxprom
>>   store i32 1, i32* %arrayidx, align 4, !tbaa !1
>>   %b = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32 1
>>   store i32 2, i32* %b, align 4, !tbaa !5
>>   %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
>> i32 0, i64 0
>>   %0 = load i32, i32* %arrayidx2, align 4, !tbaa !1
>>
>
> I'm not entirely sure why TBAA is necessary to disambiguate ps->a
from
> ps->b, it looks like basicaa should already be able to say they
don't
> overlap.
> Does this not happen?
>
Opps, you are right in my example basicaaa could do it potentially. Correct
example is slightly different:
int foo(struct S *ps, int i) {
  ps->a[i] = 1;
  ps->b = 2;
  return ps->a[i];
}
Here basicaa cannot make sure that 'ps->a[i]' doesn't change
after 'ps->b 2' because if 'i == 10' all 3 memory accesses
will read/write the same
memory. And type information about S::a is required to disambiguate. With
current TBAA 'ps->a[i]' is about random 'int' read.

> Missing information here is the range inside struct S that could be
>> accessed.
>>
>
> What do you mean by "could be accessed".  Do you mean "valid
to access in
> C"?
>
By access I meant read/write memory i.e. that size of S::a inside the
struct or at least information that only S::a is accessed in this place
i.e. not S::b.

>
>> Also as you can see array member of struct in TBAA is presented as
>> omnipotent char not as an array of int.
>>
>
> Agreed.
>
>
>
>> Arrays in struct in TBAA can be represented something like this:
>> !6 = !{!"S", !7, i64 0, !2, i64 40}
>> !7 = !{!"<unique id of int[10]>", !2, i64 0}
>>
>> And 'ps->a[i]' could have TBAA like this:
>> !8 = !{!6, !7, i64 0}
>>
>
>
> Yes. This should likely work. Note that size, while nice, is harder.
>
Yes, knowledge of size is very good thing but it seems that we can do more
even without size. Just using path aware TBAA as we do today but enable it
for arrays.

> One thing that is sadly still common (at least in C) is to do this:
>
> struct S {
>   int b;
>   int a[0]; // or 1
> };
>
> and malloc it at (sizeof S + 40 * sizeof (int)), then write into a[1...39].
>
>
> If we want to break that, it is likely a lot of stuff gets broken (at one
> point when we did it in gcc, we broke 80% of all the packages in a given
> linux distro ....)
>
I absolutely agree that we cannot break this. We only can assume that S::b
is not accessed via S::s with negative index. As far as I know it shouldn't
break good programs.

As far as I can see if struct is enclosed in another struct,
information>> about inner struct get lost only offset present. But I think for arrays
it
>> is better to keep array type in TBAA for the struct and element
accesses.
>>
>
> Don't get me wrong, i think that it would be nice to have offset and
size,
> and gcc does indeed track this info on it's own.
>
> I'm just trying to understand where you think it will provide better
info.
>
> Because once you get into cases like:
>
> struct S {
>   int a[10];
>   int b;
> };
>
> int foo(struct S *ps, int *i) {
>   ps->a[i] = 1;
>   *i = 3;
>   return ps->b;
> }
>
> You have no guarantee, for example, that *i and *(ps->b) are not the
same
> memory.
>
Yes, in this example pointer 'i' can point to S:b or S::a so we cannot
disambiguate it even with sizes and better TBAA. We need restrict somehere
here or information from callgraph to something but it is out of scope TBAA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151210/be9bcea2/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Dec 2015 - Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

[llvm-dev] Field sensitive alias analysis?

Seemingly Similar Threads