thr3ads.net - llvm dev - [llvm-dev] TBAA for subset of a loaded value [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Keno Fischer via llvm-dev

2017-Apr-11 01:01 UTC

[llvm-dev] TBAA for subset of a loaded value

I'm interested in what we can do about TBAA for loads that the
compiler inserts when optimizing loads into smaller loads (e.g. what
SROA does). I'm gonna set the stage by using a small C snippet,
because I think C has the best-understood implementation of TBAA among
the folks on the list. However, I was unable to actually come up with
an example where this inhibits optimizations coming from C code. I
myself have this problem as a real problem in a non-C frontend, which
allows for more stringent AA. With that in mind, consider

void foo(unsigned int *data, int128_t *value) {
    for (int i = 1; i < 100; ++i) {
        data[i] += (int)*value;
    }
}

This will get lowered to something like (pseudo llvm IR here)

for.body:
%l = load i128, i128* %value, !tbaa !1
%t = trunc %l to i32
%gep = ...
%old = load i32, i32* %gep, !tbaa !2
%new = add i32 %t, %old
store i32 %new, %gep, !tbaa !2

Now, it seems like a perfectly reasonable optimization for the
optimizer (though it currently doesn't do that), to fold %l and %t to:

%b = bitcast i128* %value to i32*
%t = load i32, i32 *%b

However, what can we say about the tbaa of %t at this point? With the
current semantics, I'm not sure we're allowed to say anything (in this
case maybe we're still allowed to say !tbaa !1, but consider what
happens for say *value>>32, which we could fold to gep+load). However,
that would mean that we lose the information that %value does not
alias %data, so e.g. we can't perform LICM anymore.

Now, as I said, I couldn't actually get this to be a problem, because
C only uses scalar TBAA for rather small values. However, in my
frontend, for certain memory access I know that there are no interior
pointer, so I emit scalar TBAA for a larger struct. Then SROA comes
along and splits it, but it drops the TBAA.

So the main points of discussion are:
1) Is there anything we can say about a smaller load created from a
TBAA annotated larger load in the current semantics?
2) If not, what is the best way to retain this information. E.g.
should we have a TBAA node that says "this access was n bytes into a
scalar TBAA of type !1"?

I would very much appreciate any guidance on the right approach to
this (perfectly willing and planning to code this up myself - but
would appreciate discussion on direction).

Thanks,
Keno

Sanjoy Das via llvm-dev

2017-Apr-11 01:44 UTC

head link

[llvm-dev] TBAA for subset of a loaded value

Hi Keno,

On April 10, 2017 at 6:02:38 PM, Keno Fischer via llvm-dev
(llvm-dev at lists.llvm.org) wrote:> I'm interested in what we can do about TBAA for loads that the
> compiler inserts when optimizing loads into smaller loads (e.g. what
> SROA does). I'm gonna set the stage by using a small C snippet,
> because I think C has the best-understood implementation of TBAA among
> the folks on the list. However, I was unable to actually come up with
> an example where this inhibits optimizations coming from C code. I
> myself have this problem as a real problem in a non-C frontend, which
> allows for more stringent AA. With that in mind, consider
>
> void foo(unsigned int *data, int128_t *value) {
> for (int i = 1; i < 100; ++i) {
> data[i] += (int)*value;
> }
> }
>
> This will get lowered to something like (pseudo llvm IR here)
>
> for.body:
> %l = load i128, i128* %value, !tbaa !1
> %t = trunc %l to i32
> %gep = ...
> %old = load i32, i32* %gep, !tbaa !2
> %new = add i32 %t, %old
> store i32 %new, %gep, !tbaa !2
>
> Now, it seems like a perfectly reasonable optimization for the
> optimizer (though it currently doesn't do that), to fold %l and %t to:
>
> %b = bitcast i128* %value to i32*
> %t = load i32, i32 *%b
>
> However, what can we say about the tbaa of %t at this point? With the
> current semantics, I'm not sure we're allowed to say anything (in
this
> case maybe we're still allowed to say !tbaa !1, but consider what
> happens for say *value>>32, which we could fold to gep+load).
However,
> that would mean that we lose the information that %value does not
> alias %data, so e.g. we can't perform LICM anymore.
>
> Now, as I said, I couldn't actually get this to be a problem, because
> C only uses scalar TBAA for rather small values. However, in my
> frontend, for certain memory access I know that there are no interior
> pointer, so I emit scalar TBAA for a larger struct. Then SROA comes
> along and splits it, but it drops the TBAA.
>
> So the main points of discussion are:
> 1) Is there anything we can say about a smaller load created from a
> TBAA annotated larger load in the current semantics?
> 2) If not, what is the best way to retain this information. E.g.
> should we have a TBAA node that says "this access was n bytes into a
> scalar TBAA of type !1"?
>
I'm not sure if we can do something with the TBAA metadata we have
*today*, but I think there is hope if we're willing to change our TBAA
metadata representation.

In particular, if we allow access types[0] of TBAA tags to be struct
types then we can probably solve the problem.

In your case, we'd start with %l tagged with a scalar access type,
!i128ty:

  %l = load i128, i128* %value, !tbaa !{!i128ty, !i128ty, i64 0}
  %t = trunc %l to i32

which after narrowing would be

  %l = load i32, i32* %value, !tbaa !{!i128ty, !i32ty, i64 0}
  %t = trunc %l to i32

and we'd also change !i128ty from a scalar type to a struct type,
containing a !i32ty at the appropriate offset.  Of course, the loads
and stores which mention !i128ty as their access type now have a
struct type as their access type, which is why we have to make the
generalization I mentioned.

[0]: http://llvm.org/docs/LangRef.html#semantics

-- Sanjoy

> I would very much appreciate any guidance on the right approach to
> this (perfectly willing and planning to code this up myself - but
> would appreciate discussion on direction).
>
> Thanks,
> Keno
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hal Finkel via llvm-dev

2017-Apr-11 01:47 UTC

head link

[llvm-dev] TBAA for subset of a loaded value

On 04/10/2017 08:01 PM, Keno Fischer wrote:> I'm interested in what we can do about TBAA for loads that the
> compiler inserts when optimizing loads into smaller loads (e.g. what
> SROA does). I'm gonna set the stage by using a small C snippet,
> because I think C has the best-understood implementation of TBAA among
> the folks on the list. However, I was unable to actually come up with
> an example where this inhibits optimizations coming from C code. I
> myself have this problem as a real problem in a non-C frontend, which
> allows for more stringent AA. With that in mind, consider
>
> void foo(unsigned int *data, int128_t *value) {
>      for (int i = 1; i < 100; ++i) {
>          data[i] += (int)*value;
>      }
> }
>
> This will get lowered to something like (pseudo llvm IR here)
>
> for.body:
> %l = load i128, i128* %value, !tbaa !1
> %t = trunc %l to i32
> %gep = ...
> %old = load i32, i32* %gep, !tbaa !2
> %new = add i32 %t, %old
> store i32 %new, %gep, !tbaa !2
>
> Now, it seems like a perfectly reasonable optimization for the
> optimizer (though it currently doesn't do that), to fold %l and %t to:
>
> %b = bitcast i128* %value to i32*
> %t = load i32, i32 *%b
>
> However, what can we say about the tbaa of %t at this point? With the
> current semantics, I'm not sure we're allowed to say anything (in
this
> case maybe we're still allowed to say !tbaa !1, but consider what
> happens for say *value>>32, which we could fold to gep+load).
I think we can still tag the high-part load with the same TBAA metadata. 
The TBAA metadata is not really attached to pointer, but to memory 
accesses. The access includes the size, and TBAA also rules out partial 
aliases. Thus, if we have a large load with TBAA, we can tag all 
"interior" loads with the same TBAA metadata.

Thus, I think SROA just needs to propagate the TBAA metadata from the 
original access to all of the accesses it creates from it.

  -Hal
>   However,
> that would mean that we lose the information that %value does not
> alias %data, so e.g. we can't perform LICM anymore.
>
> Now, as I said, I couldn't actually get this to be a problem, because
> C only uses scalar TBAA for rather small values. However, in my
> frontend, for certain memory access I know that there are no interior
> pointer, so I emit scalar TBAA for a larger struct. Then SROA comes
> along and splits it, but it drops the TBAA.
>
> So the main points of discussion are:
> 1) Is there anything we can say about a smaller load created from a
> TBAA annotated larger load in the current semantics?
> 2) If not, what is the best way to retain this information. E.g.
> should we have a TBAA node that says "this access was n bytes into a
> scalar TBAA of type !1"?
>
> I would very much appreciate any guidance on the right approach to
> this (perfectly willing and planning to code this up myself - but
> would appreciate discussion on direction).
>
> Thanks,
> Keno
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel via llvm-dev

2017-Apr-11 02:01 UTC

head link

[llvm-dev] TBAA for subset of a loaded value

On 04/10/2017 08:44 PM, Sanjoy Das wrote:> Hi Keno,
>
> On April 10, 2017 at 6:02:38 PM, Keno Fischer via llvm-dev
> (llvm-dev at lists.llvm.org) wrote:
>> I'm interested in what we can do about TBAA for loads that the
>> compiler inserts when optimizing loads into smaller loads (e.g. what
>> SROA does). I'm gonna set the stage by using a small C snippet,
>> because I think C has the best-understood implementation of TBAA among
>> the folks on the list. However, I was unable to actually come up with
>> an example where this inhibits optimizations coming from C code. I
>> myself have this problem as a real problem in a non-C frontend, which
>> allows for more stringent AA. With that in mind, consider
>>
>> void foo(unsigned int *data, int128_t *value) {
>> for (int i = 1; i < 100; ++i) {
>> data[i] += (int)*value;
>> }
>> }
>>
>> This will get lowered to something like (pseudo llvm IR here)
>>
>> for.body:
>> %l = load i128, i128* %value, !tbaa !1
>> %t = trunc %l to i32
>> %gep = ...
>> %old = load i32, i32* %gep, !tbaa !2
>> %new = add i32 %t, %old
>> store i32 %new, %gep, !tbaa !2
>>
>> Now, it seems like a perfectly reasonable optimization for the
>> optimizer (though it currently doesn't do that), to fold %l and %t
to:
>>
>> %b = bitcast i128* %value to i32*
>> %t = load i32, i32 *%b
>>
>> However, what can we say about the tbaa of %t at this point? With the
>> current semantics, I'm not sure we're allowed to say anything
(in this
>> case maybe we're still allowed to say !tbaa !1, but consider what
>> happens for say *value>>32, which we could fold to gep+load).
However,
>> that would mean that we lose the information that %value does not
>> alias %data, so e.g. we can't perform LICM anymore.
>>
>> Now, as I said, I couldn't actually get this to be a problem,
because
>> C only uses scalar TBAA for rather small values. However, in my
>> frontend, for certain memory access I know that there are no interior
>> pointer, so I emit scalar TBAA for a larger struct. Then SROA comes
>> along and splits it, but it drops the TBAA.
>>
>> So the main points of discussion are:
>> 1) Is there anything we can say about a smaller load created from a
>> TBAA annotated larger load in the current semantics?
>> 2) If not, what is the best way to retain this information. E.g.
>> should we have a TBAA node that says "this access was n bytes into
a
>> scalar TBAA of type !1"?
>>
> I'm not sure if we can do something with the TBAA metadata we have
> *today*, but I think there is hope if we're willing to change our TBAA
> metadata representation.
>
> In particular, if we allow access types[0] of TBAA tags to be struct
> types then we can probably solve the problem.
>
> In your case, we'd start with %l tagged with a scalar access type,
> !i128ty:
>
>    %l = load i128, i128* %value, !tbaa !{!i128ty, !i128ty, i64 0}
>    %t = trunc %l to i32
>
> which after narrowing would be
>
>    %l = load i32, i32* %value, !tbaa !{!i128ty, !i32ty, i64 0}
>    %t = trunc %l to i32
>
> and we'd also change !i128ty from a scalar type to a struct type,
> containing a !i32ty at the appropriate offset.  Of course, the loads
> and stores which mention !i128ty as their access type now have a
> struct type as their access type, which is why we have to make the
> generalization I mentioned.
>
> [0]: http://llvm.org/docs/LangRef.html#semantics'
As I mentioned a few moments ago, I think we can just propagate the 
metadata. One thing which it worth pointing out is that, when TBAA 
checks an access, it does not actually check the access size (our struct 
path metadata has offsets in it, but we don't really use the sizes in 
conjunction with those, we're just checking for an offset match up the 
node hierarchy).

  -Hal
>
> -- Sanjoy
>
>
>> I would very much appreciate any guidance on the right approach to
>> this (perfectly willing and planning to code this up myself - but
>> would appreciate discussion on direction).
>>
>> Thanks,
>> Keno
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Apr 2017 - TBAA for subset of a loaded value

[llvm-dev] TBAA for subset of a loaded value

[llvm-dev] TBAA for subset of a loaded value

[llvm-dev] TBAA for subset of a loaded value

[llvm-dev] TBAA for subset of a loaded value