thr3ads.net - llvm dev - [llvm-dev] Generalizing load/store promotion in LICM [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2018-Sep-13 20:42 UTC

[llvm-dev] Generalizing load/store promotion in LICM

(minor inline additions)

On 09/13/2018 01:51 AM, Chandler Carruth wrote:> Haven't had time to dig into this, but wanted to add +Alina Sbirlea 
> <mailto:asbirlea at google.com> to the thread as she has been working
on
> promotion and other aspects of LICM for a long time here.
Thanks!> On Wed, Sep 12, 2018 at 11:41 PM Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>     I'm thinking about making some semi radical changes to load store
>     promotion works in LICM, and figured it would be a good idea to
>     get buy in before I actually started writing code.  :)
>
>     TLDR: legality of sinking stores to exits is hard, can we separate
>     load handling into a super aggressive form of PRE, and use
>     predicated stores to avoid solving legality question?
>
>
>     Background
>
>     We've been seeing an interesting class of problems recently that
>     looks roughly like this:
>
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) // some data dependent check
>         g_count++; // conditional load and store to shared location
>
>     The critical detail in this example is that g_count is a global
>     location which may be accessed concurrently* by multiple threads. 
>     The challenge here is that we don't know whether the store ever
>     executes, and we're not allowed to insert a store along any path
>     that didn't originally contain them.  Said differently, if all
>     elements in "a" are non-zero, we're not allowed to store
to
>     g_count.  We do know that the g_count location is dereferenceable
>     though.
>
>     (*Please, let's avoid the memory model derailment here. I'm
>     simplifying and C++ language rules aren't real useful for my Java
>     language frontend anyways.  In practice, all the access are
>     atomic, but unordered, but we'll leave that out of discussion
>     otherwise.)
>
>     I have two approaches I'm considering here.  These are orthogonal,
>     but I suspect we'll want to implement both.
>
>
>     Proposal A - Use predicated stores in loop exits
>
>     The basic idea is that we don't worry about solving the legality
>     question above, and just insert a store which is predicated on a
>     condition which is true exactly when the original store ran.  In
>     pseudo code, this looks something like:
>
>     bool StoreLegal = false;
>     int LocalCount = g_count;
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) {
>         LocalCount++;
>         StoreLegal = true;
>       }
>     if (StoreLegal) g_count = LocalCount;
>
>     There are two obvious concerns here:
>
>      1. The predicated store might be expensive in practice - true for
>         most current architectures.
>      2. We''re introducing an extra boolean phi cycle around the
loop.
>
>     Talking to a couple of folks offline at the socials over the last
>     few months, the former seems to be the main objection.  I think we
>     can control this by restricting this transform to when the loop is
>     believed to have a high trip count and the conditional path is
>     taken with some frequency.   Getting feedback on this particular
>     point is one of my main reasons for writing this mail.
>
>     The second objection can frequently be resolved by finding other
>     expressions which evaluate to the same boolean.  (In this case, if
>     LocalCount != LocalCountOrig assuming i doesn't overflow.)  We
>     already have a framework with SCEV to do these replacements. 
>     Though, from some quick testing, it would definitely need
>     strengthening. However, SCEV can't remove the extra phi in all
>     cases, so we have to be okay with the extra phi cycle in the
>     general case.  This seems worthwhile to me, but thoughts?
>
>
>     Proposal B - Separate load and store handling into distinct transforms
>
>     (For folks I've discussed this with before, this part is all new.)
>
>     Thinking about the problem, it finally occurred to me that we can
>     decompose the original example into two steps: getting the loads
>     out of the loop, and sinking the stores out of the loop.  If we
>     can accomplish the former, but not the later, we've still made
>     meaningful progress.
>
>     So, what'd we'd essentially have is a load only transformation
>     which produces this:
>     int LocalCount = g_count;
>     for (int = 0; i < N; i++)
>       if (a[i] == 0) {
>         LocalCount++;
>         g_count = LocalCount;
>       }
>
>     At this point, we've reduced memory traffic by half, and opened up
>     the possibility that other parts of the optimizer can exploit the
>     simplified form.  The last point is particular interesting since
>     we generally try to canonicalize loads out of loops, and much of
>     the optimizer is tuned for a form with as much as possible being
>     loop invariant.  As one example, completely by accident, there's
>     some work going on in the LoopVectorizer right now to handle such
>     stores to loop invariant addresses during vectorization.  Putting
>     the two pieces together would let us fully vectorize this loop
>     without needing to sink the stores at all.
>
>     In practice, the existing implementation in LICM would cleanly
>     split along these lines with little problem.
>
>     One way of looking at this load specific transform is as an
>     extreme form of PRE (partial redundancy elimination). Our current
>     PRE implementation doesn't handle cases this complicated.
>It occurred to my later that simply framing the new transform as a 
separate pass (LoopPRE) and using the same AST + SSA construction 
approach would be straight forward.  So, if folks think that having an 
aggressive form of load PRE in LICM is going a bit too far, it'd be easy 
to represent as an optional separate pass.  I'd still prefer having LICM 
contain the logic though.>
>     Thoughts?
>
>     Philip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180913/c383000f/attachment.html>

Alina Sbirlea via llvm-dev

2018-Sep-13 21:00 UTC

head link

[llvm-dev] Generalizing load/store promotion in LICM

For context, I've been looking into replacing the use of AST
(AliasSetTracker) with MemorySSA in LICM. This works great for
sinking/hoisting but does not apply well for promotion, and one of the
solutions I considered is precisely ripping out the promotion part and
replacing its functionality with a separate PRE pass + possibly store
sinking. FWIW I think that's the right way to go.
I did not get deep enough into working on the solution but I would gladly
have a detailed discussion to move this forward.

Reading into detail now.

Thanks,
Alina

On Thu, Sep 13, 2018 at 1:43 PM Philip Reames <listmail at
philipreames.com>
wrote:
> (minor inline additions)
>
> On 09/13/2018 01:51 AM, Chandler Carruth wrote:
>
> Haven't had time to dig into this, but wanted to add +Alina Sbirlea
> <asbirlea at google.com> to the thread as she has been working on
promotion
> and other aspects of LICM for a long time here.
>
> Thanks!
>
> On Wed, Sep 12, 2018 at 11:41 PM Philip Reames <listmail at
philipreames.com>
> wrote:
>
>> I'm thinking about making some semi radical changes to load store
>> promotion works in LICM, and figured it would be a good idea to get buy
in
>> before I actually started writing code.  :)
>>
>> TLDR: legality of sinking stores to exits is hard, can we separate load
>> handling into a super aggressive form of PRE, and use predicated stores
to
>> avoid solving legality question?
>>
>>
>> Background
>>
>> We've been seeing an interesting class of problems recently that
looks
>> roughly like this:
>>
>> for (int = 0; i < N; i++)
>>   if (a[i] == 0) // some data dependent check
>>     g_count++; // conditional load and store to shared location
>>
>> The critical detail in this example is that g_count is a global
location
>> which may be accessed concurrently* by multiple threads.  The challenge
>> here is that we don't know whether the store ever executes, and
we're not
>> allowed to insert a store along any path that didn't originally
contain
>> them.  Said differently, if all elements in "a" are non-zero,
we're not
>> allowed to store to g_count.  We do know that the g_count location is
>> dereferenceable though.
>>
>> (*Please, let's avoid the memory model derailment here.  I'm
simplifying
>> and C++ language rules aren't real useful for my Java language
frontend
>> anyways.  In practice, all the access are atomic, but unordered, but
we'll
>> leave that out of discussion otherwise.)
>>
>> I have two approaches I'm considering here.  These are orthogonal,
but I
>> suspect we'll want to implement both.
>>
>>
>> Proposal A - Use predicated stores in loop exits
>>
>> The basic idea is that we don't worry about solving the legality
question
>> above, and just insert a store which is predicated on a condition which
is
>> true exactly when the original store ran.  In pseudo code, this looks
>> something like:
>>
>> bool StoreLegal = false;
>> int LocalCount = g_count;
>> for (int = 0; i < N; i++)
>>   if (a[i] == 0) {
>>     LocalCount++;
>>     StoreLegal = true;
>>   }
>> if (StoreLegal) g_count = LocalCount;
>>
>> There are two obvious concerns here:
>>
>>    1. The predicated store might be expensive in practice - true for
>>    most current architectures.
>>    2. We''re introducing an extra boolean phi cycle around the
loop.
>>
>> Talking to a couple of folks offline at the socials over the last few
>> months, the former seems to be the main objection.  I think we can
control
>> this by restricting this transform to when the loop is believed to have
a
>> high trip count and the conditional path is taken with some frequency.
>> Getting feedback on this particular point is one of my main reasons for
>> writing this mail.
>>
>> The second objection can frequently be resolved by finding other
>> expressions which evaluate to the same boolean.  (In this case, if
>> LocalCount != LocalCountOrig assuming i doesn't overflow.)  We
already have
>> a framework with SCEV to do these replacements.  Though, from some
quick
>> testing, it would definitely need strengthening.  However, SCEV
can't
>> remove the extra phi in all cases, so we have to be okay with the extra
phi
>> cycle in the general case.  This seems worthwhile to me, but thoughts?
>>
>>
>> Proposal B - Separate load and store handling into distinct transforms
>>
>> (For folks I've discussed this with before, this part is all new.)
>>
>> Thinking about the problem, it finally occurred to me that we can
>> decompose the original example into two steps: getting the loads out of
the
>> loop, and sinking the stores out of the loop.  If we can accomplish the
>> former, but not the later, we've still made meaningful progress.
>>
>> So, what'd we'd essentially have is a load only transformation
which
>> produces this:
>> int LocalCount = g_count;
>> for (int = 0; i < N; i++)
>>   if (a[i] == 0) {
>>     LocalCount++;
>>     g_count = LocalCount;
>>   }
>>
>> At this point, we've reduced memory traffic by half, and opened up
the
>> possibility that other parts of the optimizer can exploit the
simplified
>> form.  The last point is particular interesting since we generally try
to
>> canonicalize loads out of loops, and much of the optimizer is tuned for
a
>> form with as much as possible being loop invariant.  As one example,
>> completely by accident, there's some work going on in the
LoopVectorizer
>> right now to handle such stores to loop invariant addresses during
>> vectorization.  Putting the two pieces together would let us fully
>> vectorize this loop without needing to sink the stores at all.
>>
>> In practice, the existing implementation in LICM would cleanly split
>> along these lines with little problem.
>>
>> One way of looking at this load specific transform is as an extreme
form
>> of PRE (partial redundancy elimination).  Our current PRE
implementation
>> doesn't handle cases this complicated.
>>
> It occurred to my later that simply framing the new transform as a
> separate pass (LoopPRE) and using the same AST + SSA construction approach
> would be straight forward.  So, if folks think that having an aggressive
> form of load PRE in LICM is going a bit too far, it'd be easy to
represent
> as an optional separate pass.  I'd still prefer having LICM contain the
> logic though.
>
> Thoughts?
>>
>> Philip
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180913/efa66c72/attachment.html>

John Brawn via llvm-dev

2018-Sep-14 11:56 UTC

head link

[llvm-dev] Generalizing load/store promotion in LICM

For doing PRE on the load, it looks like there’s only two things stopping GVN
PRE from doing it:
* GVN::PerformLoadPRE doesn’t hoist loads that are conditional. Probably this
can be overcome with some kind of
     heuristic that allows it to happen in loops when the blocks where a load
would have to be inserted are outside
     the loop.
* IsFullyAvailableInBlock goes around the loop and double-counts the entry-edge
into the loop as unavailable. I’m
     pretty sure this can be easily fixed by marking the block that
PerformLoadPRE calls IsFullyAvailableInBlock on as
     FullyAvailable, so it stops there when following the back-edge.
I did a quick experiment (solving the first problem by just commenting out that
check) and this worked for a simple
test case of
int x;
int arr[32];
void fn(int n) {
   for (int i = 0; i < n; i++) {
     if (arr[i]) {
       x++;
     }
   }
}
So perhaps the load PRE half can be done without a separate pass.

John

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Alina
Sbirlea via llvm-dev
Sent: 13 September 2018 22:01
To: listmail at philipreames.com
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Generalizing load/store promotion in LICM

For context, I've been looking into replacing the use of AST
(AliasSetTracker) with MemorySSA in LICM. This works great for sinking/hoisting
but does not apply well for promotion, and one of the solutions I considered is
precisely ripping out the promotion part and replacing its functionality with a
separate PRE pass + possibly store sinking. FWIW I think that's the right
way to go.
I did not get deep enough into working on the solution but I would gladly have a
detailed discussion to move this forward.

Reading into detail now.

Thanks,
Alina

On Thu, Sep 13, 2018 at 1:43 PM Philip Reames <listmail at
philipreames.com<mailto:listmail at philipreames.com>> wrote:
(minor inline additions)

On 09/13/2018 01:51 AM, Chandler Carruth wrote:

Haven't had time to dig into this, but wanted to add +Alina
Sbirlea<mailto:asbirlea at google.com> to the thread as she has been
working on promotion and other aspects of LICM for a long time here.
Thanks!

On Wed, Sep 12, 2018 at 11:41 PM Philip Reames <listmail at
philipreames.com<mailto:listmail at philipreames.com>> wrote:

I'm thinking about making some semi radical changes to load store promotion
works in LICM, and figured it would be a good idea to get buy in before I
actually started writing code.  :)

TLDR: legality of sinking stores to exits is hard, can we separate load handling
into a super aggressive form of PRE, and use predicated stores to avoid solving
legality question?



Background

We've been seeing an interesting class of problems recently that looks
roughly like this:

for (int = 0; i < N; i++)
  if (a[i] == 0) // some data dependent check
    g_count++; // conditional load and store to shared location

The critical detail in this example is that g_count is a global location which
may be accessed concurrently* by multiple threads.  The challenge here is that
we don't know whether the store ever executes, and we're not allowed to
insert a store along any path that didn't originally contain them.  Said
differently, if all elements in "a" are non-zero, we're not
allowed to store to g_count.  We do know that the g_count location is
dereferenceable though.

(*Please, let's avoid the memory model derailment here.  I'm simplifying
and C++ language rules aren't real useful for my Java language frontend
anyways.  In practice, all the access are atomic, but unordered, but we'll
leave that out of discussion otherwise.)

I have two approaches I'm considering here.  These are orthogonal, but I
suspect we'll want to implement both.



Proposal A - Use predicated stores in loop exits

The basic idea is that we don't worry about solving the legality question
above, and just insert a store which is predicated on a condition which is true
exactly when the original store ran.  In pseudo code, this looks something like:

bool StoreLegal = false;
int LocalCount = g_count;
for (int = 0; i < N; i++)
  if (a[i] == 0) {
    LocalCount++;
    StoreLegal = true;
  }
if (StoreLegal) g_count = LocalCount;

There are two obvious concerns here:

  1.  The predicated store might be expensive in practice - true for most
current architectures.
  2.  We''re introducing an extra boolean phi cycle around the loop.

Talking to a couple of folks offline at the socials over the last few months,
the former seems to be the main objection.  I think we can control this by
restricting this transform to when the loop is believed to have a high trip
count and the conditional path is taken with some frequency.   Getting feedback
on this particular point is one of my main reasons for writing this mail.

The second objection can frequently be resolved by finding other expressions
which evaluate to the same boolean.  (In this case, if LocalCount !=
LocalCountOrig assuming i doesn't overflow.)  We already have a framework
with SCEV to do these replacements.  Though, from some quick testing, it would
definitely need strengthening.  However, SCEV can't remove the extra phi in
all cases, so we have to be okay with the extra phi cycle in the general case. 
This seems worthwhile to me, but thoughts?



Proposal B - Separate load and store handling into distinct transforms

(For folks I've discussed this with before, this part is all new.)

Thinking about the problem, it finally occurred to me that we can decompose the
original example into two steps: getting the loads out of the loop, and sinking
the stores out of the loop.  If we can accomplish the former, but not the later,
we've still made meaningful progress.

So, what'd we'd essentially have is a load only transformation which
produces this:
int LocalCount = g_count;
for (int = 0; i < N; i++)
  if (a[i] == 0) {
    LocalCount++;
    g_count = LocalCount;
  }

At this point, we've reduced memory traffic by half, and opened up the
possibility that other parts of the optimizer can exploit the simplified form. 
The last point is particular interesting since we generally try to canonicalize
loads out of loops, and much of the optimizer is tuned for a form with as much
as possible being loop invariant.  As one example, completely by accident,
there's some work going on in the LoopVectorizer right now to handle such
stores to loop invariant addresses during vectorization.  Putting the two pieces
together would let us fully vectorize this loop without needing to sink the
stores at all.

In practice, the existing implementation in LICM would cleanly split along these
lines with little problem.

One way of looking at this load specific transform is as an extreme form of PRE
(partial redundancy elimination).  Our current PRE implementation doesn't
handle cases this complicated.
It occurred to my later that simply framing the new transform as a separate pass
(LoopPRE) and using the same AST + SSA construction approach would be straight
forward.  So, if folks think that having an aggressive form of load PRE in LICM
is going a bit too far, it'd be easy to represent as an optional separate
pass.  I'd still prefer having LICM contain the logic though.


Thoughts?

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180914/725d4714/attachment-0001.html>

Philip Reames via llvm-dev

2018-Sep-14 23:25 UTC

head link

[llvm-dev] Generalizing load/store promotion in LICM

This is going OT from the original thread, but, what the heck...

Alina, can you explain the challenge with implementing promotion over 
MemorySSA?  On the surface, it seems like it should be fairly straight 
forward to provide an alias set like abstraction over MemorySSA.  What 
am I missing?

Here's a sketch of how I'd think to approach this:

Visit the MemoryPhi nodes in a loop.  Every interesting promotion case 
(mod in loop) must be involved in at least one MemoryPhi cycle.

For each MemoryPhi in the header, create a set of all MemoryDef 
instructions involved.  (I'm saying this in terms of instructions, but 
sets of MemoryLocations should be equivalent.)

For each instruction in loop, identify it's (optimized) memory def.  If 
that def is outside loop, and we're looking at a load/store, then we can 
trivially hoist/sink (aliasing wise only).  If the def is in the loop, 
it must be involved in one of our cycles, add it to related alias set.

When I last looked, we were debating whether MemoryPhis were optimized 
by default.  Did we end up deciding they weren't?  If so, I can 
definitely see the problem there.  Even then, constructing an AST for 
only the instructions involved in a loop MemoryPhi cycle should be 
pretty cheap.


Separately, can you summarize what the overall status of MemorySSA is?  
I've lost track.  It looks like it's enabled by default for EarlyCSE.  
So, does that mean we believe the bugs have been worked out and we 
"just" need to plumb it through the pipeline?

Philip


On 09/13/2018 02:00 PM, Alina Sbirlea wrote:> For context, I've been looking into replacing the use of AST 
> (AliasSetTracker) with MemorySSA in LICM. This works great for 
> sinking/hoisting but does not apply well for promotion, and one of the 
> solutions I considered is precisely ripping out the promotion part and 
> replacing its functionality with a separate PRE pass + possibly store 
> sinking. FWIW I think that's the right way to go.
> I did not get deep enough into working on the solution but I would 
> gladly have a detailed discussion to move this forward.
>
> Reading into detail now.
>
> Thanks,
> Alina
>
> On Thu, Sep 13, 2018 at 1:43 PM Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>     (minor inline additions)
>
>     On 09/13/2018 01:51 AM, Chandler Carruth wrote:
>>     Haven't had time to dig into this, but wanted to add +Alina
>>     Sbirlea <mailto:asbirlea at google.com> to the thread as she
has
>>     been working on promotion and other aspects of LICM for a long
>>     time here.
>     Thanks!
>>     On Wed, Sep 12, 2018 at 11:41 PM Philip Reames
>>     <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>>
>>         I'm thinking about making some semi radical changes to load
>>         store promotion works in LICM, and figured it would be a good
>>         idea to get buy in before I actually started writing code.  :)
>>
>>         TLDR: legality of sinking stores to exits is hard, can we
>>         separate load handling into a super aggressive form of PRE,
>>         and use predicated stores to avoid solving legality question?
>>
>>
>>         Background
>>
>>         We've been seeing an interesting class of problems recently
>>         that looks roughly like this:
>>
>>         for (int = 0; i < N; i++)
>>           if (a[i] == 0) // some data dependent check
>>             g_count++; // conditional load and store to shared location
>>
>>         The critical detail in this example is that g_count is a
>>         global location which may be accessed concurrently* by
>>         multiple threads.  The challenge here is that we don't know
>>         whether the store ever executes, and we're not allowed to
>>         insert a store along any path that didn't originally
contain
>>         them.  Said differently, if all elements in "a" are
non-zero,
>>         we're not allowed to store to g_count.  We do know that the
>>         g_count location is dereferenceable though.
>>
>>         (*Please, let's avoid the memory model derailment here. 
I'm
>>         simplifying and C++ language rules aren't real useful for
my
>>         Java language frontend anyways.  In practice, all the access
>>         are atomic, but unordered, but we'll leave that out of
>>         discussion otherwise.)
>>
>>         I have two approaches I'm considering here. These are
>>         orthogonal, but I suspect we'll want to implement both.
>>
>>
>>         Proposal A - Use predicated stores in loop exits
>>
>>         The basic idea is that we don't worry about solving the
>>         legality question above, and just insert a store which is
>>         predicated on a condition which is true exactly when the
>>         original store ran.  In pseudo code, this looks something like:
>>
>>         bool StoreLegal = false;
>>         int LocalCount = g_count;
>>         for (int = 0; i < N; i++)
>>           if (a[i] == 0) {
>>             LocalCount++;
>>             StoreLegal = true;
>>           }
>>         if (StoreLegal) g_count = LocalCount;
>>
>>         There are two obvious concerns here:
>>
>>          1. The predicated store might be expensive in practice -
>>             true for most current architectures.
>>          2. We''re introducing an extra boolean phi cycle
around the
>>             loop.
>>
>>         Talking to a couple of folks offline at the socials over the
>>         last few months, the former seems to be the main objection. 
>>         I think we can control this by restricting this transform to
>>         when the loop is believed to have a high trip count and the
>>         conditional path is taken with some frequency. Getting
>>         feedback on this particular point is one of my main reasons
>>         for writing this mail.
>>
>>         The second objection can frequently be resolved by finding
>>         other expressions which evaluate to the same boolean.  (In
>>         this case, if LocalCount != LocalCountOrig assuming i
doesn't
>>         overflow.)  We already have a framework with SCEV to do these
>>         replacements.  Though, from some quick testing, it would
>>         definitely need strengthening.  However, SCEV can't remove
>>         the extra phi in all cases, so we have to be okay with the
>>         extra phi cycle in the general case.  This seems worthwhile
>>         to me, but thoughts?
>>
>>
>>         Proposal B - Separate load and store handling into distinct
>>         transforms
>>
>>         (For folks I've discussed this with before, this part is
all
>>         new.)
>>
>>         Thinking about the problem, it finally occurred to me that we
>>         can decompose the original example into two steps: getting
>>         the loads out of the loop, and sinking the stores out of the
>>         loop.  If we can accomplish the former, but not the later,
>>         we've still made meaningful progress.
>>
>>         So, what'd we'd essentially have is a load only
>>         transformation which produces this:
>>         int LocalCount = g_count;
>>         for (int = 0; i < N; i++)
>>           if (a[i] == 0) {
>>             LocalCount++;
>>             g_count = LocalCount;
>>           }
>>
>>         At this point, we've reduced memory traffic by half, and
>>         opened up the possibility that other parts of the optimizer
>>         can exploit the simplified form.  The last point is
>>         particular interesting since we generally try to canonicalize
>>         loads out of loops, and much of the optimizer is tuned for a
>>         form with as much as possible being loop invariant.  As one
>>         example, completely by accident, there's some work going on
>>         in the LoopVectorizer right now to handle such stores to loop
>>         invariant addresses during vectorization. Putting the two
>>         pieces together would let us fully vectorize this loop
>>         without needing to sink the stores at all.
>>
>>         In practice, the existing implementation in LICM would
>>         cleanly split along these lines with little problem.
>>
>>         One way of looking at this load specific transform is as an
>>         extreme form of PRE (partial redundancy elimination).  Our
>>         current PRE implementation doesn't handle cases this
>>         complicated.
>>
>     It occurred to my later that simply framing the new transform as a
>     separate pass (LoopPRE) and using the same AST + SSA construction
>     approach would be straight forward. So, if folks think that having
>     an aggressive form of load PRE in LICM is going a bit too far,
>     it'd be easy to represent as an optional separate pass.  I'd
still
>     prefer having LICM contain the logic though.
>>
>>         Thoughts?
>>
>>         Philip
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180914/3616b193/attachment-0001.html>

llvm dev - Sep 2018 - Generalizing load/store promotion in LICM

[llvm-dev] Generalizing load/store promotion in LICM

[llvm-dev] Generalizing load/store promotion in LICM

[llvm-dev] Generalizing load/store promotion in LICM

[llvm-dev] Generalizing load/store promotion in LICM