thr3ads.net - llvm dev - [LLVMdev] Optimization hints for "constant" loads [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2014-Sep-10 05:11 UTC

[LLVMdev] Optimization hints for "constant" loads

I'm looking at how to optimize IR which contains reads from a field 
which is known to be initialized exactly once.  I've got an idea on how 
to approach this, but wanted to see if others have alternate ideas or 
similar problems which might spark discussion.  It feels like there's a 
potentially generally useful optimization hint here if we can generalize 
it sufficiently without loosing optimization potential.

The problem:
struct array {
private:
   // once initialized 'len' never changes
   int len;
   // data can be modified at will
   char data[0];
public:
   static array* make(int len) {
     array* a = ... allocate uninitialized space
     a->len = len;
     return a;
   }
};
void access(array* a, int idx) {
   if( idx >= 0 && idx <- a->len ) {
     a->data[idx] = 5;
   }
}
void foo(array* a) {
   for(int i = 0; i < a->len; i++) {
     access(a, i);
   }
}
// assume 'access' is inlined into 'foo' and the loop is
unrolled a time
or two

To phrase that again in english, I've got a field which is initialized 
once, with naive code which reads from it many times.  I know at IR 
generation time that a load from array::len is special, but I loose this 
information as soon as I generate IR.  In particular, I run into 
aliasing problems where the location a->len is considered 'mayalias' 
with unrelated stores thus preventing value forwarding, LICM, and other 
desirable optimizations.

Existing Approaches:
1) Use TBAA -  By tagging loads and stores to the two fields of the 
array struct as disjoint branches in the TBAA tree, I can inform LLVM 
that a load of 'len' never aliases with a store through 'data'. 
This
mostly works, and enables many loads to be forwarded by GVN, but (due to 
the intervening stores) is a complete loss in EarlyCSE and (due to 
intervening calls) LICM.
    a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could 
improve the situation in EarlyCSE.
2) Use "invariant.load" metadata - This metadata indicates that the 
field loaded from is initialized before the execution of the code being 
compiled.  In particular, "invariant.load" implies that the load is
not
control dependent on any branch, is safe to speculate, and that no write 
aliases the location read from.  This mostly works, but only if 
'array::make' is never visible in the IR.   As soon as
'array::make'
gets inlined, all bets are off and mis-compiles may result.
    a) Also, in practice, only LICM really knows about 
"invariant.load".  It would be pretty straight forward to teach both 
EarlyCSE and GVN about them though. 
http://llvm.org/bugs/show_bug.cgi?id=20806

New Approaches:
(This isn't so much "being proposed" as "being put forward
for discussion".)

1) Introduce a new metadata type "initialized-before-load" which
implies
that the value of any two loads tagged with the metadata along any 
execution path must yield the same result.

This doesn't give much freedom to the 'first' load; it's still
control
dependent, can't be reordered with preceding stores, can't be lifted out
of loops, etc...  It can however be reordered with following stores or 
sunk out of a loop provided the loop body is known to execute at least 
once.

The second load has a lot more freedom.  Provided that there is always 
another load to the same location (with the metadata) provable preceding 
it on all paths, it can be reordered freely, lifted over control 
branches, lifted out of loops, etc... Importantly, it is also legal to 
forward the value of a preceding load to a later load provided that a) 
both have the metadata and b) that one load executes strictly before the 
other.

A store marked "initialized-before-load" is undefined if there is a
load
with the same metadata on the same location preceding it. There may be 
*multiple* stores to the location along a path, provided that the first 
load is strictly after *all* of them.

This seems staight forward to implement in EarlyCSE and LICM.  I haven't 
looked closely at GVN, but expect it's probably not hard either.

2) Introduce a slightly different metadata "initialized-once".
Semantics
are very similar to the preceding except that there can only be a single 
store to the location along any path.

Value forwarding from the store to a following load (with metadata) is 
allowed regardless of potentially aliasing intervening stores.

This was actually my original idea, but it has a couple of problems.  
First, it breaks on surprisingly common initialization patterns such as 
default initialization followed by real initialization.  Secondly, I'm 
not sure the optimizer would always preserve the "write once"
property.
In particular, the optimizer is free to divide a large write into 
several smaller ones (assuming the write is not atomic.)



Thoughts?  Suggestions?  Similar sounding problems this might be able to 
solve?

Philip

Kevin Modzelewski

2014-Sep-10 21:42 UTC

head link

[LLVMdev] Optimization hints for "constant" loads

We have some similar cases and wanted the same thing; what we were doing
for a while is using the third "is constant" field of the TBAA
metadata and
setting that to 1.  I'm not 100% sure what the semantics of that are --
LangRef says it means that pointsToConstantMemory() returns true which
means that the memory is "impossible ... to be modified", which seems
like
not quite a fit for this set-exactly-once use case.  In practice, looking
at the IR after our optimization pipeline, we were getting the results we
wanted: if a store and subsequent loads were seen together, the store would
remain and the value would be forwarded to all the loads.  (I don't think I
looked at the "multiple loads with no visible store which should get
collapsed to a single load" case.)  ymmv

We've disabled the optimization for now, since without an easy way of
putting the annotation in the C++ source code and getting it passed through
clang, it became a burden to keep the application logic in sync with the
tbaa-fixup code that lived in a different place.  We'll fix it eventually...

kmod


On Tue, Sep 9, 2014 at 10:11 PM, Philip Reames <listmail at
philipreames.com>
wrote:
> I'm looking at how to optimize IR which contains reads from a field
which
> is known to be initialized exactly once.  I've got an idea on how to
> approach this, but wanted to see if others have alternate ideas or similar
> problems which might spark discussion.  It feels like there's a
potentially
> generally useful optimization hint here if we can generalize it
> sufficiently without loosing optimization potential.
>
> The problem:
> struct array {
> private:
>   // once initialized 'len' never changes
>   int len;
>   // data can be modified at will
>   char data[0];
> public:
>   static array* make(int len) {
>     array* a = ... allocate uninitialized space
>     a->len = len;
>     return a;
>   }
> };
> void access(array* a, int idx) {
>   if( idx >= 0 && idx <- a->len ) {
>     a->data[idx] = 5;
>   }
> }
> void foo(array* a) {
>   for(int i = 0; i < a->len; i++) {
>     access(a, i);
>   }
> }
> // assume 'access' is inlined into 'foo' and the loop is
unrolled a time
> or two
>
> To phrase that again in english, I've got a field which is initialized
> once, with naive code which reads from it many times.  I know at IR
> generation time that a load from array::len is special, but I loose this
> information as soon as I generate IR.  In particular, I run into aliasing
> problems where the location a->len is considered 'mayalias' with
unrelated
> stores thus preventing value forwarding, LICM, and other desirable
> optimizations.
>
> Existing Approaches:
> 1) Use TBAA -  By tagging loads and stores to the two fields of the array
> struct as disjoint branches in the TBAA tree, I can inform LLVM that a load
> of 'len' never aliases with a store through 'data'.  This
mostly works, and
> enables many loads to be forwarded by GVN, but (due to the intervening
> stores) is a complete loss in EarlyCSE and (due to intervening calls) LICM.
>    a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
> improve the situation in EarlyCSE.
> 2) Use "invariant.load" metadata - This metadata indicates that
the field
> loaded from is initialized before the execution of the code being
> compiled.  In particular, "invariant.load" implies that the load
is not
> control dependent on any branch, is safe to speculate, and that no write
> aliases the location read from.  This mostly works, but only if
> 'array::make' is never visible in the IR.   As soon as
'array::make' gets
> inlined, all bets are off and mis-compiles may result.
>    a) Also, in practice, only LICM really knows about
"invariant.load".
> It would be pretty straight forward to teach both EarlyCSE and GVN about
> them though. http://llvm.org/bugs/show_bug.cgi?id=20806
>
> New Approaches:
> (This isn't so much "being proposed" as "being put
forward for
> discussion".)
>
> 1) Introduce a new metadata type "initialized-before-load" which
implies
> that the value of any two loads tagged with the metadata along any
> execution path must yield the same result.
>
> This doesn't give much freedom to the 'first' load; it's
still control
> dependent, can't be reordered with preceding stores, can't be
lifted out of
> loops, etc...  It can however be reordered with following stores or sunk
> out of a loop provided the loop body is known to execute at least once.
>
> The second load has a lot more freedom.  Provided that there is always
> another load to the same location (with the metadata) provable preceding it
> on all paths, it can be reordered freely, lifted over control branches,
> lifted out of loops, etc... Importantly, it is also legal to forward the
> value of a preceding load to a later load provided that a) both have the
> metadata and b) that one load executes strictly before the other.
>
> A store marked "initialized-before-load" is undefined if there is
a load
> with the same metadata on the same location preceding it. There may be
> *multiple* stores to the location along a path, provided that the first
> load is strictly after *all* of them.
>
> This seems staight forward to implement in EarlyCSE and LICM.  I
haven't
> looked closely at GVN, but expect it's probably not hard either.
>
> 2) Introduce a slightly different metadata "initialized-once".
Semantics
> are very similar to the preceding except that there can only be a single
> store to the location along any path.
>
> Value forwarding from the store to a following load (with metadata) is
> allowed regardless of potentially aliasing intervening stores.
>
> This was actually my original idea, but it has a couple of problems.
> First, it breaks on surprisingly common initialization patterns such as
> default initialization followed by real initialization.  Secondly, I'm
not
> sure the optimizer would always preserve the "write once"
property.  In
> particular, the optimizer is free to divide a large write into several
> smaller ones (assuming the write is not atomic.)
>
>
>
> Thoughts?  Suggestions?  Similar sounding problems this might be able to
> solve?
>
> Philip
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/03e19413/attachment.html>

Philip Reames

2014-Sep-10 22:21 UTC

head link

[LLVMdev] Optimization hints for "constant" loads

On 09/10/2014 02:42 PM, Kevin Modzelewski wrote:> We have some similar cases and wanted the same thing; what we were 
> doing for a while is using the third "is constant" field of the
TBAA
> metadata and setting that to 1.  I'm not 100% sure what the semantics 
> of that are -- LangRef says it means that pointsToConstantMemory() 
> returns true which means that the memory is "impossible ... to be 
> modified", which seems like not quite a fit for this set-exactly-once 
> use case.  In practice, looking at the IR after our optimization 
> pipeline, we were getting the results we wanted: if a store and 
> subsequent loads were seen together, the store would remain and the 
> value would be forwarded to all the loads.  (I don't think I looked at 
> the "multiple loads with no visible store which should get collapsed 
> to a single load" case.)  ymmvI hadn't looked at this approach much, but based on the documentation, 
you're basically just asking for miscompiles here. The semantics seem to 
be the same as "invariant.load".   While the optimizer happens to not
be
removing the stores, it seems like it would be perfectly legal for it to 
do so.>
> We've disabled the optimization for now, since without an easy way of 
> putting the annotation in the C++ source code and getting it passed 
> through clang, it became a burden to keep the application logic in 
> sync with the tbaa-fixup code that lived in a different place.  We'll 
> fix it eventually...I can't comment on this part.  I'm assuming the C++ code being compiled 
is your runtime or something?>
> kmod
>
>
> On Tue, Sep 9, 2014 at 10:11 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>     I'm looking at how to optimize IR which contains reads from a
>     field which is known to be initialized exactly once.  I've got an
>     idea on how to approach this, but wanted to see if others have
>     alternate ideas or similar problems which might spark discussion. 
>     It feels like there's a potentially generally useful optimization
>     hint here if we can generalize it sufficiently without loosing
>     optimization potential.
>
>     The problem:
>     struct array {
>     private:
>       // once initialized 'len' never changes
>       int len;
>       // data can be modified at will
>       char data[0];
>     public:
>       static array* make(int len) {
>         array* a = ... allocate uninitialized space
>         a->len = len;
>         return a;
>       }
>     };
>     void access(array* a, int idx) {
>       if( idx >= 0 && idx <- a->len ) {
>         a->data[idx] = 5;
>       }
>     }
>     void foo(array* a) {
>       for(int i = 0; i < a->len; i++) {
>         access(a, i);
>       }
>     }
>     // assume 'access' is inlined into 'foo' and the loop
is unrolled
>     a time or two
>
>     To phrase that again in english, I've got a field which is
>     initialized once, with naive code which reads from it many times. 
>     I know at IR generation time that a load from array::len is
>     special, but I loose this information as soon as I generate IR. 
>     In particular, I run into aliasing problems where the location
>     a->len is considered 'mayalias' with unrelated stores thus
>     preventing value forwarding, LICM, and other desirable optimizations.
>
>     Existing Approaches:
>     1) Use TBAA -  By tagging loads and stores to the two fields of
>     the array struct as disjoint branches in the TBAA tree, I can
>     inform LLVM that a load of 'len' never aliases with a store
>     through 'data'.  This mostly works, and enables many loads to
be
>     forwarded by GVN, but (due to the intervening stores) is a
>     complete loss in EarlyCSE and (due to intervening calls) LICM.
>        a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
>     improve the situation in EarlyCSE.
>     2) Use "invariant.load" metadata - This metadata indicates
that
>     the field loaded from is initialized before the execution of the
>     code being compiled.  In particular, "invariant.load" implies
that
>     the load is not control dependent on any branch, is safe to
>     speculate, and that no write aliases the location read from.  This
>     mostly works, but only if 'array::make' is never visible in the
>     IR.   As soon as 'array::make' gets inlined, all bets are off
and
>     mis-compiles may result.
>        a) Also, in practice, only LICM really knows about
>     "invariant.load".  It would be pretty straight forward to
teach
>     both EarlyCSE and GVN about them though.
>     http://llvm.org/bugs/show_bug.cgi?id=20806
>
>     New Approaches:
>     (This isn't so much "being proposed" as "being put
forward for
>     discussion".)
>
>     1) Introduce a new metadata type "initialized-before-load"
which
>     implies that the value of any two loads tagged with the metadata
>     along any execution path must yield the same result.
>
>     This doesn't give much freedom to the 'first' load;
it's still
>     control dependent, can't be reordered with preceding stores,
can't
>     be lifted out of loops, etc...  It can however be reordered with
>     following stores or sunk out of a loop provided the loop body is
>     known to execute at least once.
>
>     The second load has a lot more freedom.  Provided that there is
>     always another load to the same location (with the metadata)
>     provable preceding it on all paths, it can be reordered freely,
>     lifted over control branches, lifted out of loops, etc...
>     Importantly, it is also legal to forward the value of a preceding
>     load to a later load provided that a) both have the metadata and
>     b) that one load executes strictly before the other.
>
>     A store marked "initialized-before-load" is undefined if
there is
>     a load with the same metadata on the same location preceding it.
>     There may be *multiple* stores to the location along a path,
>     provided that the first load is strictly after *all* of them.
>
>     This seems staight forward to implement in EarlyCSE and LICM.  I
>     haven't looked closely at GVN, but expect it's probably not
hard
>     either.
>
>     2) Introduce a slightly different metadata
"initialized-once".
>     Semantics are very similar to the preceding except that there can
>     only be a single store to the location along any path.
>
>     Value forwarding from the store to a following load (with
>     metadata) is allowed regardless of potentially aliasing
>     intervening stores.
>
>     This was actually my original idea, but it has a couple of
>     problems.  First, it breaks on surprisingly common initialization
>     patterns such as default initialization followed by real
>     initialization.  Secondly, I'm not sure the optimizer would always
>     preserve the "write once" property. In particular, the
optimizer
>     is free to divide a large write into several smaller ones
>     (assuming the write is not atomic.)
>
>
>
>     Thoughts?  Suggestions?  Similar sounding problems this might be
>     able to solve?
>
>     Philip
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/e0fa6e27/attachment.html>

Hal Finkel

2014-Sep-28 08:22 UTC

head link

[LLVMdev] Optimization hints for "constant" loads

----- Original Message -----> From: "Philip Reames" <listmail at philipreames.com>
> To: llvmdev at cs.uiuc.edu
> Cc: "Hal Finkel" <hfinkel at anl.gov>, nicholas at mxc.ca
> Sent: Wednesday, September 10, 2014 12:11:28 AM
> Subject: Optimization hints for "constant" loads
> 
> I'm looking at how to optimize IR which contains reads from a field
> which is known to be initialized exactly once.  I've got an idea on
> how
> to approach this, but wanted to see if others have alternate ideas or
> similar problems which might spark discussion.  It feels like there's
> a
> potentially generally useful optimization hint here if we can
> generalize
> it sufficiently without loosing optimization potential.
> 
> The problem:
> struct array {
> private:
>    // once initialized 'len' never changes
>    int len;
>    // data can be modified at will
>    char data[0];
> public:
>    static array* make(int len) {
>      array* a = ... allocate uninitialized space
>      a->len = len;
>      return a;
>    }
> };
> void access(array* a, int idx) {
>    if( idx >= 0 && idx <- a->len ) {
>      a->data[idx] = 5;
>    }
> }
> void foo(array* a) {
>    for(int i = 0; i < a->len; i++) {
>      access(a, i);
>    }
> }
> // assume 'access' is inlined into 'foo' and the loop is
unrolled a
> time
> or two
> 
> To phrase that again in english, I've got a field which is
> initialized
> once, with naive code which reads from it many times.  I know at IR
> generation time that a load from array::len is special, but I loose
> this
> information as soon as I generate IR.  In particular, I run into
> aliasing problems where the location a->len is considered
'mayalias'
> with unrelated stores thus preventing value forwarding, LICM, and
> other
> desirable optimizations.
> 
> Existing Approaches:
I think that, at least in theory, we already have a solution to this problem.
If, after the initialization is complete, you insert a call to the
llvm.invariant.start intrinsic (and perhaps also do so at the start of any
routine you know can be called only after initialization is complete), that
should convey the information you want.

I've never worked with this intrinsic before, but if this does not work,
I'd be really curious to know why.

Thanks again,
Hal

> 1) Use TBAA -  By tagging loads and stores to the two fields of the
> array struct as disjoint branches in the TBAA tree, I can inform LLVM
> that a load of 'len' never aliases with a store through
'data'.  This
> mostly works, and enables many loads to be forwarded by GVN, but (due
> to
> the intervening stores) is a complete loss in EarlyCSE and (due to
> intervening calls) LICM.
>     a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
> improve the situation in EarlyCSE.
> 2) Use "invariant.load" metadata - This metadata indicates that
the
> field loaded from is initialized before the execution of the code
> being
> compiled.  In particular, "invariant.load" implies that the load
is
> not
> control dependent on any branch, is safe to speculate, and that no
> write
> aliases the location read from.  This mostly works, but only if
> 'array::make' is never visible in the IR.   As soon as
'array::make'
> gets inlined, all bets are off and mis-compiles may result.
>     a) Also, in practice, only LICM really knows about
> "invariant.load".  It would be pretty straight forward to teach
both
> EarlyCSE and GVN about them though.
> http://llvm.org/bugs/show_bug.cgi?id=20806
> 
> New Approaches:
> (This isn't so much "being proposed" as "being put
forward for
> discussion".)
> 
> 1) Introduce a new metadata type "initialized-before-load" which
> implies
> that the value of any two loads tagged with the metadata along any
> execution path must yield the same result.
> 
> This doesn't give much freedom to the 'first' load; it's
still
> control
> dependent, can't be reordered with preceding stores, can't be
lifted
> out
> of loops, etc...  It can however be reordered with following stores
> or
> sunk out of a loop provided the loop body is known to execute at
> least
> once.
> 
> The second load has a lot more freedom.  Provided that there is
> always
> another load to the same location (with the metadata) provable
> preceding
> it on all paths, it can be reordered freely, lifted over control
> branches, lifted out of loops, etc... Importantly, it is also legal
> to
> forward the value of a preceding load to a later load provided that
> a)
> both have the metadata and b) that one load executes strictly before
> the
> other.
> 
> A store marked "initialized-before-load" is undefined if there is
a
> load
> with the same metadata on the same location preceding it. There may
> be
> *multiple* stores to the location along a path, provided that the
> first
> load is strictly after *all* of them.
> 
> This seems staight forward to implement in EarlyCSE and LICM.  I
> haven't
> looked closely at GVN, but expect it's probably not hard either.
> 
> 2) Introduce a slightly different metadata "initialized-once".
> Semantics
> are very similar to the preceding except that there can only be a
> single
> store to the location along any path.
> 
> Value forwarding from the store to a following load (with metadata)
> is
> allowed regardless of potentially aliasing intervening stores.
> 
> This was actually my original idea, but it has a couple of problems.
> First, it breaks on surprisingly common initialization patterns such
> as
> default initialization followed by real initialization.  Secondly,
> I'm
> not sure the optimizer would always preserve the "write once"
> property.
> In particular, the optimizer is free to divide a large write into
> several smaller ones (assuming the write is not atomic.)
> 
> 
> 
> Thoughts?  Suggestions?  Similar sounding problems this might be able
> to
> solve?
> 
> Philip
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Philip Reames

2014-Sep-29 22:33 UTC

head link

[LLVMdev] Optimization hints for "constant" loads

On 09/28/2014 01:22 AM, Hal Finkel wrote:> ----- Original Message -----
>> From: "Philip Reames" <listmail at philipreames.com>
>> To: llvmdev at cs.uiuc.edu
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, nicholas at
mxc.ca
>> Sent: Wednesday, September 10, 2014 12:11:28 AM
>> Subject: Optimization hints for "constant" loads
>>
>> I'm looking at how to optimize IR which contains reads from a field
>> which is known to be initialized exactly once.  I've got an idea on
>> how
>> to approach this, but wanted to see if others have alternate ideas or
>> similar problems which might spark discussion.  It feels like
there's
>> a
>> potentially generally useful optimization hint here if we can
>> generalize
>> it sufficiently without loosing optimization potential.
>>
>> The problem:
>> struct array {
>> private:
>>     // once initialized 'len' never changes
>>     int len;
>>     // data can be modified at will
>>     char data[0];
>> public:
>>     static array* make(int len) {
>>       array* a = ... allocate uninitialized space
>>       a->len = len;
>>       return a;
>>     }
>> };
>> void access(array* a, int idx) {
>>     if( idx >= 0 && idx <- a->len ) {
>>       a->data[idx] = 5;
>>     }
>> }
>> void foo(array* a) {
>>     for(int i = 0; i < a->len; i++) {
>>       access(a, i);
>>     }
>> }
>> // assume 'access' is inlined into 'foo' and the loop
is unrolled a
>> time
>> or two
>>
>> To phrase that again in english, I've got a field which is
>> initialized
>> once, with naive code which reads from it many times.  I know at IR
>> generation time that a load from array::len is special, but I loose
>> this
>> information as soon as I generate IR.  In particular, I run into
>> aliasing problems where the location a->len is considered
'mayalias'
>> with unrelated stores thus preventing value forwarding, LICM, and
>> other
>> desirable optimizations.
>>
>> Existing Approaches:
> I think that, at least in theory, we already have a solution to this
problem. If, after the initialization is complete, you insert a call to the
llvm.invariant.start intrinsic (and perhaps also do so at the start of any
routine you know can be called only after initialization is complete), that
should convey the information you want.
>
> I've never worked with this intrinsic before, but if this does not
work, I'd be really curious to know why.Huh.  You're right, this should work.  I'm going to run some experiments
and see what happens in practice.

Thanks for the pointer on this!  Not sure how I managed to miss this 
approach.>
> Thanks again,
> Hal
>
>
>> 1) Use TBAA -  By tagging loads and stores to the two fields of the
>> array struct as disjoint branches in the TBAA tree, I can inform LLVM
>> that a load of 'len' never aliases with a store through
'data'.  This
>> mostly works, and enables many loads to be forwarded by GVN, but (due
>> to
>> the intervening stores) is a complete loss in EarlyCSE and (due to
>> intervening calls) LICM.
>>      a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
>> improve the situation in EarlyCSE.
>> 2) Use "invariant.load" metadata - This metadata indicates
that the
>> field loaded from is initialized before the execution of the code
>> being
>> compiled.  In particular, "invariant.load" implies that the
load is
>> not
>> control dependent on any branch, is safe to speculate, and that no
>> write
>> aliases the location read from.  This mostly works, but only if
>> 'array::make' is never visible in the IR.   As soon as
'array::make'
>> gets inlined, all bets are off and mis-compiles may result.
>>      a) Also, in practice, only LICM really knows about
>> "invariant.load".  It would be pretty straight forward to
teach both
>> EarlyCSE and GVN about them though.
>> http://llvm.org/bugs/show_bug.cgi?id=20806
>>
>> New Approaches:
>> (This isn't so much "being proposed" as "being put
forward for
>> discussion".)
>>
>> 1) Introduce a new metadata type "initialized-before-load"
which
>> implies
>> that the value of any two loads tagged with the metadata along any
>> execution path must yield the same result.
>>
>> This doesn't give much freedom to the 'first' load;
it's still
>> control
>> dependent, can't be reordered with preceding stores, can't be
lifted
>> out
>> of loops, etc...  It can however be reordered with following stores
>> or
>> sunk out of a loop provided the loop body is known to execute at
>> least
>> once.
>>
>> The second load has a lot more freedom.  Provided that there is
>> always
>> another load to the same location (with the metadata) provable
>> preceding
>> it on all paths, it can be reordered freely, lifted over control
>> branches, lifted out of loops, etc... Importantly, it is also legal
>> to
>> forward the value of a preceding load to a later load provided that
>> a)
>> both have the metadata and b) that one load executes strictly before
>> the
>> other.
>>
>> A store marked "initialized-before-load" is undefined if
there is a
>> load
>> with the same metadata on the same location preceding it. There may
>> be
>> *multiple* stores to the location along a path, provided that the
>> first
>> load is strictly after *all* of them.
>>
>> This seems staight forward to implement in EarlyCSE and LICM.  I
>> haven't
>> looked closely at GVN, but expect it's probably not hard either.
>>
>> 2) Introduce a slightly different metadata
"initialized-once".
>> Semantics
>> are very similar to the preceding except that there can only be a
>> single
>> store to the location along any path.
>>
>> Value forwarding from the store to a following load (with metadata)
>> is
>> allowed regardless of potentially aliasing intervening stores.
>>
>> This was actually my original idea, but it has a couple of problems.
>> First, it breaks on surprisingly common initialization patterns such
>> as
>> default initialization followed by real initialization.  Secondly,
>> I'm
>> not sure the optimizer would always preserve the "write once"
>> property.
>> In particular, the optimizer is free to divide a large write into
>> several smaller ones (assuming the write is not atomic.)
>>
>>
>>
>> Thoughts?  Suggestions?  Similar sounding problems this might be able
>> to
>> solve?
>>
>> Philip
>>
>>

Philip Reames

2014-Oct-11 01:07 UTC

head link

[LLVMdev] Optimization hints for "constant" loads

On 09/28/2014 01:22 AM, Hal Finkel wrote:> ----- Original Message -----
>> From: "Philip Reames" <listmail at philipreames.com>
>> To: llvmdev at cs.uiuc.edu
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, nicholas at
mxc.ca
>> Sent: Wednesday, September 10, 2014 12:11:28 AM
>> Subject: Optimization hints for "constant" loads
>>
>> I'm looking at how to optimize IR which contains reads from a field
>> which is known to be initialized exactly once.  I've got an idea on
>> how
>> to approach this, but wanted to see if others have alternate ideas or
>> similar problems which might spark discussion.  It feels like
there's
>> a
>> potentially generally useful optimization hint here if we can
>> generalize
>> it sufficiently without loosing optimization potential.
>>
>> The problem:
>> struct array {
>> private:
>>     // once initialized 'len' never changes
>>     int len;
>>     // data can be modified at will
>>     char data[0];
>> public:
>>     static array* make(int len) {
>>       array* a = ... allocate uninitialized space
>>       a->len = len;
>>       return a;
>>     }
>> };
>> void access(array* a, int idx) {
>>     if( idx >= 0 && idx <- a->len ) {
>>       a->data[idx] = 5;
>>     }
>> }
>> void foo(array* a) {
>>     for(int i = 0; i < a->len; i++) {
>>       access(a, i);
>>     }
>> }
>> // assume 'access' is inlined into 'foo' and the loop
is unrolled a
>> time
>> or two
>>
>> To phrase that again in english, I've got a field which is
>> initialized
>> once, with naive code which reads from it many times.  I know at IR
>> generation time that a load from array::len is special, but I loose
>> this
>> information as soon as I generate IR.  In particular, I run into
>> aliasing problems where the location a->len is considered
'mayalias'
>> with unrelated stores thus preventing value forwarding, LICM, and
>> other
>> desirable optimizations.
>>
>> Existing Approaches:
> I think that, at least in theory, we already have a solution to this
problem. If, after the initialization is complete, you insert a call to the
llvm.invariant.start intrinsic (and perhaps also do so at the start of any
routine you know can be called only after initialization is complete), that
should convey the information you want.
>
> I've never worked with this intrinsic before, but if this does not
work, I'd be really curious to know why. From some experiments, the existing invariant intrinsics appear nearly 
useless for my purposes.  The problem is that any call can hide a 
invariant.end intrinsic.  As a result, the optimizer must conservatively 
assume that any call clobbers the "invariant" location.  This makes
the
intrinsic a non-starter in it's current form.

; Function Attrs: uwtable
define zeroext i1 @_Z4testv() #0 {
   %1 = tail call noalias i8* @_Znwm(i64 4) #3
   %2 = bitcast i8* %1 to i32*
   store i32 5, i32* %2, align 4, !tbaa !1
   %discard = call {}* @llvm.invariant.start(i64 4, i8* %1)
   tail call void @_Z6escapePi(i32* %2)
   %3 = load i32* %2, align 4, !tbaa !1
   %4 = icmp eq i32 %3, 5 <-- this conditional should be folded away and is
not
   ret i1 %4
}

declare {}* @llvm.invariant.start(i64, i8*)

; Function Attrs: nobuiltin
declare noalias i8* @_Znwm(i64) #1

declare void @_Z6escapePi(i32*) #2

It also appears that the intrinsic has limited implementation in the 
optimizer.  Even surprisingly simple cases don't appear to kick in. 
Consider:
define zeroext i1 @_Z4testv() #0 {
   %1 = tail call noalias i8* @_Znwm(i64 4) #4
   %2 = bitcast i8* %1 to i32*
   store i32 5, i32* %2, align 4, !tbaa !1
   %discard = tail call {}* @llvm.invariant.start(i64 4, i8* %1)
   %3 = load i32* %2, align 4, !tbaa !1
   %4 = icmp eq i32 %3, 5 <-- This conditional should be folded
   tail call void @_Z6escapePi(i32* %2, i1 %4)
   %5 = load i32* %2, align 4, !tbaa !1
   %6 = icmp eq i32 %5, 5
   ret i1 %6
}


We could extend the existing intrinsic with a notion of invariant.start 
that has no end.  This could be as simple as adding a boolean parameter 
to the intrinsic.  I think this could be made to work.  There'd be other 
implementation work needed to make it actually useful, the validity 
based on dominance model used by assumes seems like an obvious candidate.

Thinking through this, I see a couple of places that we'd change:
- isLocationKnownInvariant(Value* Loc, Instruction* Cntx) and related 
analysis pass (analogous to assumptions)
   - The new "no end" version is easy (dominance)
   - The older version is a graph walk looking paths which include 
either an end, or call.
- eagerly propagate known invariant location store values in EarlyCSE 
(strictly by dominance for the new intrinsic form)
- Add an InstCombine rule to fold a store to a known invariant location 
to undef
- Teach GVN about it (gets the non dominance cases), could also do fun 
things which phis of invariant locations, but not sure that's actually 
useful
- Teach LICM to lift loads of known invariant locations out of loops

Does this seem like a workable approach?

Philip>> 1) Use TBAA -  By tagging loads and stores to the two fields of the
>> array struct as disjoint branches in the TBAA tree, I can inform LLVM
>> that a load of 'len' never aliases with a store through
'data'.  This
>> mostly works, and enables many loads to be forwarded by GVN, but (due
>> to
>> the intervening stores) is a complete loss in EarlyCSE and (due to
>> intervening calls) LICM.
>>      a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
>> improve the situation in EarlyCSE.
>> 2) Use "invariant.load" metadata - This metadata indicates
that the
>> field loaded from is initialized before the execution of the code
>> being
>> compiled.  In particular, "invariant.load" implies that the
load is
>> not
>> control dependent on any branch, is safe to speculate, and that no
>> write
>> aliases the location read from.  This mostly works, but only if
>> 'array::make' is never visible in the IR.   As soon as
'array::make'
>> gets inlined, all bets are off and mis-compiles may result.
>>      a) Also, in practice, only LICM really knows about
>> "invariant.load".  It would be pretty straight forward to
teach both
>> EarlyCSE and GVN about them though.
>> http://llvm.org/bugs/show_bug.cgi?id=20806
>>
>> New Approaches:
>> (This isn't so much "being proposed" as "being put
forward for
>> discussion".)
>>
>> 1) Introduce a new metadata type "initialized-before-load"
which
>> implies
>> that the value of any two loads tagged with the metadata along any
>> execution path must yield the same result.
>>
>> This doesn't give much freedom to the 'first' load;
it's still
>> control
>> dependent, can't be reordered with preceding stores, can't be
lifted
>> out
>> of loops, etc...  It can however be reordered with following stores
>> or
>> sunk out of a loop provided the loop body is known to execute at
>> least
>> once.
>>
>> The second load has a lot more freedom.  Provided that there is
>> always
>> another load to the same location (with the metadata) provable
>> preceding
>> it on all paths, it can be reordered freely, lifted over control
>> branches, lifted out of loops, etc... Importantly, it is also legal
>> to
>> forward the value of a preceding load to a later load provided that
>> a)
>> both have the metadata and b) that one load executes strictly before
>> the
>> other.
>>
>> A store marked "initialized-before-load" is undefined if
there is a
>> load
>> with the same metadata on the same location preceding it. There may
>> be
>> *multiple* stores to the location along a path, provided that the
>> first
>> load is strictly after *all* of them.
>>
>> This seems staight forward to implement in EarlyCSE and LICM.  I
>> haven't
>> looked closely at GVN, but expect it's probably not hard either.
>>
>> 2) Introduce a slightly different metadata
"initialized-once".
>> Semantics
>> are very similar to the preceding except that there can only be a
>> single
>> store to the location along any path.
>>
>> Value forwarding from the store to a following load (with metadata)
>> is
>> allowed regardless of potentially aliasing intervening stores.
>>
>> This was actually my original idea, but it has a couple of problems.
>> First, it breaks on surprisingly common initialization patterns such
>> as
>> default initialization followed by real initialization.  Secondly,
>> I'm
>> not sure the optimizer would always preserve the "write once"
>> property.
>> In particular, the optimizer is free to divide a large write into
>> several smaller ones (assuming the write is not atomic.)
>>
>>
>>
>> Thoughts?  Suggestions?  Similar sounding problems this might be able
>> to
>> solve?
>>
>> Philip
>>
>>

Philip Reames

2014-Dec-01 23:44 UTC

head link

[LLVMdev] TBAA vs !invariant.load metadata semantics

(Spawning a separate subthread off the 'Optimization hints for 
"constant" loads' discussion for a related question. )

Looking at TBAA again, I was reminded that TBAA also contains a third 
field which indicates that "meaning pointsToConstantMemory should return 
true; see other useful AliasAnalysis methods 
<http://llvm.org/docs/AliasAnalysis.html#OtherItfs>".  Looking at
this a
bit, it really seems like this flag has the exact same meaning as 
!invariant.load.

pointsToConstantMemory returns a value for a Location.  Since it is 
entirely legal to have two Locations which describe the same physical 
memory, it seems like you'd be back to the same semantics as 
!invariant.load.

The only uncertainty here is that a Location is clearly (??) 
position/Instruction specific.  Does that also imply that the Location 
is control dependent?  What is the semantics of the following code?
if (is_known_invariant) {
   load %p, !tbaa is_constant=true
}

Is the optimizer allowed to lift the load above the conditional? 
(Assuming it can prove the location is known dereferenceable.)  The 
semantics for !invariant.load clearly allow this, but do the semantics 
for TBAA's constant flag?

I think the answer to these questions is that the load is *not* control 
dependent on the conditional (assuming it's otherwise known 
dereferenceable).  Given this, why do we have both?  Should we 
canonicalize one into the other?

Looking at the current implementations, it appears that TBAA's constant 
flag is more broadly implemented.  On first glance, I'm really tempted 
to just deprecate !invariant.load in place of TBAA's constant flag.  
Thoughts?

Philip



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141201/976b2945/attachment.html>

Andrew Trick

2014-Dec-02 02:05 UTC

head link

[LLVMdev] TBAA vs !invariant.load metadata semantics

> On Dec 1, 2014, at 3:44 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> (Spawning a separate subthread off the 'Optimization hints for
"constant" loads' discussion for a related question. )
> 
> Looking at TBAA again, I was reminded that TBAA also contains a third field
which indicates that "meaning pointsToConstantMemory should return true;
see other useful AliasAnalysis methods
<http://llvm.org/docs/AliasAnalysis.html#OtherItfs>".  Looking at
this a bit, it really seems like this flag has the exact same meaning as
!invariant.load.
> 
> pointsToConstantMemory returns a value for a Location.  Since it is
entirely legal to have two Locations which describe the same physical memory, it
seems like you'd be back to the same semantics as !invariant.load.
> 
> The only uncertainty here is that a Location is clearly (??)
position/Instruction specific.  Does that also imply that the Location is
control dependent?  What is the semantics of the following code?
> if (is_known_invariant) {
>   load %p, !tbaa is_constant=true
> }
> 
> Is the optimizer allowed to lift the load above the conditional?  (Assuming
it can prove the location is known dereferenceable.)  The semantics for
!invariant.load clearly allow this, but do the semantics for TBAA's constant
flag?
> 
> I think the answer to these questions is that the load is *not* control
dependent on the conditional (assuming it's otherwise known
dereferenceable).  Given this, why do we have both?  Should we canonicalize one
into the other?
It would be very confusing if the two had different semantics.

In either case, hoisting the load (without dropping metadata) is definitely
legit. But conservatively, the invariance is still a path sensitive property.
The load is invariant w.r.t. any store as long as control reaches a
use-with-side-effects of the loaded value.

Given:

store %q
m1 = load %p
if <something> {
 m2 = load %p, !invariant.load
}
m = phi(m1, m2)

We can safely convert to:

m2 = load %p, !invariant.load
store %q
m1 = load %p
if <something> {}
m = phi(m1, m2)

But cannot safely convert to:

m = load %p, !invariant.load
store %q

I would *really* like to specify more aggressive semantics so that we can do
that, but haven’t adequately proved we can do that safely. I’m don’t think the
optimizer will do the unsafe thing today, unless there’s an oversight somewhere.
> Looking at the current implementations, it appears that TBAA's constant
flag is more broadly implemented.  On first glance, I'm really tempted to
just deprecate !invariant.load in place of TBAA's constant flag.  Thoughts?
I don’t have a strong opinion here. I’m fine relying on TBAA being enabled to
active these sort of optimizations.

-Andy
> Philip
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141201/eacd0122/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2014 - [LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] Optimization hints for "constant" loads

[LLVMdev] TBAA vs !invariant.load metadata semantics

[LLVMdev] TBAA vs !invariant.load metadata semantics

Maybe Matching Threads