thr3ads.net - llvm dev - [llvm-dev] InstCombine GEP [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes via llvm-dev

2017-Aug-10 17:58 UTC

[llvm-dev] InstCombine GEP

> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> I’m not sure how transforming GEP offset to i8 type will help alias 
>> analysis & SROA for the mentioned test case.
>
> It should neither help nor hinder AA or SROA -- the two GEPs (the complex
one and the simple one) are equivalent.  > Since memory isn't typed in
LLVM, having the GEP in terms of %struct.ABC does not provide any extra
information.
Memory is somewhat typed, since if you store something with a type and load the
same location with a different type that's not valid (let's call it
poison).

Also, BasicAA has the following rule, with constants c1 and c2, and arbitrary
values x, y:
a[x][c1] no-alias a[y][c2] if:
the distance between c1 and c2 is sufficient to guarantee that the accesses will
be disjoint due to ending up in different array slots.
For this rule it's important to know what's the size of each array
element. This information is lost if GEPs are flattened.

But I agree that LLVM itself doesn't exploit types for AA extensively. For
example, a pointer based in a struct field may alias another field of the same
struct, even if at C/C++ level that's probably not allowed.

Nuno

Nema, Ashutosh via llvm-dev

2017-Aug-11 05:01 UTC

head link

[llvm-dev] InstCombine GEP

Thanks Nuno & Sanjoy for the inputs.

As you mentioned the flattened GEPs should neither help nor hinder AA &
SROA.
It's good to keep type based GEPs. I'll make the change and submit for
review.

Regards,
Ashutosh

-----Original Message-----
From: Nuno Lopes [mailto:nunoplopes at sapo.pt] 
Sent: Thursday, August 10, 2017 11:28 PM
To: 'Sanjoy Das' <sanjoy at google.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>; 'llvm-dev' <llvm-dev at
lists.llvm.org>
Subject: RE: [llvm-dev] InstCombine GEP
> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> I’m not sure how transforming GEP offset to i8 type will help alias 
>> analysis & SROA for the mentioned test case.
>
> It should neither help nor hinder AA or SROA -- the two GEPs (the complex
one and the simple one) are equivalent.  > Since memory isn't typed in
LLVM, having the GEP in terms of %struct.ABC does not provide any extra
information.
Memory is somewhat typed, since if you store something with a type and load the
same location with a different type that's not valid (let's call it
poison).

Also, BasicAA has the following rule, with constants c1 and c2, and arbitrary
values x, y:
a[x][c1] no-alias a[y][c2] if:
the distance between c1 and c2 is sufficient to guarantee that the accesses will
be disjoint due to ending up in different array slots.
For this rule it's important to know what's the size of each array
element. This information is lost if GEPs are flattened.

But I agree that LLVM itself doesn't exploit types for AA extensively. For
example, a pointer based in a struct field may alias another field of the same
struct, even if at C/C++ level that's probably not allowed.

Nuno

Sanjoy Das via llvm-dev

2017-Aug-14 05:24 UTC

head link

[llvm-dev] InstCombine GEP

Hi,

On Thu, Aug 10, 2017 at 10:58 AM, Nuno Lopes via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> I’m not sure how transforming GEP offset to i8 type will help alias
>>> analysis & SROA for the mentioned test case.
>>
>> It should neither help nor hinder AA or SROA -- the two GEPs (the
complex one and the simple one) are equivalent.
> Since memory isn't typed in LLVM, having the GEP in terms of
%struct.ABC does not provide any extra information.
>
> Memory is somewhat typed, since if you store something with a type and load
the same location with a different type that's not valid (let's call it
poison).
That may be true in C++, but I'm not sure if we want that to be true
in LLVM IR.  We would not be able to inline memcpy's if that were
true, for one thing (e.g. https://godbolt.org/g/2VVJHU).  Unless
you're talking about TBAA metadata?
> Also, BasicAA has the following rule, with constants c1 and c2, and
arbitrary values x, y:
> a[x][c1] no-alias a[y][c2] if:
> the distance between c1 and c2 is sufficient to guarantee that the accesses
will be disjoint due to ending up in different array slots.
> For this rule it's important to know what's the size of each array
element. This information is lost if GEPs are flattened.
Do you mean to say that in LLVM IR we will conclude ptr0 and ptr1 don't
alias:

  int a[4][4];
  ptr0 = &a[x][3];
  ptr1 = &a[y][7];

If so, that doesn't match my understanding -- I was under the
impression that in LLVM IR x = 2, y = 1 will give us must-alias
between ptr0 and ptr1.

-- Sanjoy
>
> But I agree that LLVM itself doesn't exploit types for AA extensively.
For example, a pointer based in a struct field may alias another field of the
same struct, even if at C/C++ level that's probably not allowed.
>
> Nuno
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nuno Lopes via llvm-dev

2017-Aug-14 11:39 UTC

head link

[llvm-dev] InstCombine GEP

> On Thu, Aug 10, 2017 at 10:58 AM, Nuno Lopes wrote:
>>> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev 
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> I’m not sure how transforming GEP offset to i8 type will help
alias
>>>> analysis & SROA for the mentioned test case.
>>>
>>> It should neither help nor hinder AA or SROA -- the two GEPs (the 
>>> complex one and the simple one) are equivalent.
>> Since memory isn't typed in LLVM, having the GEP in terms of
%struct.ABC
>> does not provide any extra information.
>>
>> Memory is somewhat typed, since if you store something with a type and 
>> load the same location with a different type that's not valid
(let's call
>> it poison).
>
> That may be true in C++, but I'm not sure if we want that to be true
> in LLVM IR.  We would not be able to inline memcpy's if that were
> true, for one thing (e.g. https://godbolt.org/g/2VVJHU).  Unless
> you're talking about TBAA metadata?
Ah, that's a very good point.  This is a simplified version of your example:
https://godbolt.org/g/RyZYga
memcpy is transformed into a store of an int, which is then loaded as float.

Well, at least according to LLVM semantics, memory records the last stored 
type size, such that it's invalid to store an i12 and load an i13.  Not sure
why this restriction in the semantics is actually needed, though.  If you 
read a smaller/larger type than what was stored, you may end up with some 
padding bits (poison). That's it.

>> Also, BasicAA has the following rule, with constants c1 and c2, and 
>> arbitrary values x, y:
>> a[x][c1] no-alias a[y][c2] if:
>> the distance between c1 and c2 is sufficient to guarantee that the 
>> accesses will be disjoint due to ending up in different array slots.
>> For this rule it's important to know what's the size of each
array
>> element. This information is lost if GEPs are flattened.
>
> Do you mean to say that in LLVM IR we will conclude ptr0 and ptr1 don't
> alias:
>
>   int a[4][4];
>   ptr0 = &a[x][3];
>   ptr1 = &a[y][7];
>
> If so, that doesn't match my understanding -- I was under the
> impression that in LLVM IR x = 2, y = 1 will give us must-alias
> between ptr0 and ptr1.
No, in this case it won't conclude no-alias, since 3 % 4 == 7 % 4.  LLVM is 
not that aggressive in exploiting UB.  Anyway, concluding no-alias here was 
only possible if the GEP index had the inrange attribute.

The example is more like this:
  int a[4][5];
  p = &a[x][0];
  q = &a[y][1];

With access sizes sp, sq, respectively:
If the access size through p ends before q (q >= sp) and the access through 
q doesn't go beyond the array limit (sq <= 5*sizeof(int) -
1*sizeof(int)),
then it's no-alias.

By flattening a GEP, you lose the information of the size of the each of 
array/struct constituents. Hence this proof rule doesn't apply and you would
get may-alias for the example above.
Another interesting conclusion is that LLVM is being quite nice by allowing 
accesses to multiple array/struct fields through the address of one of them.

The code is here: 
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/BasicAliasAnalysis.cpp?revision=310766&view=markup#l1349
(you may need to scroll back to line 1294 or even to the beginning of that 
function to see where all the data comes from)

Nuno

llvm dev - Aug 2017 - InstCombine GEP

[llvm-dev] InstCombine GEP

[llvm-dev] InstCombine GEP

[llvm-dev] InstCombine GEP

[llvm-dev] InstCombine GEP