thr3ads.net - llvm dev - [llvm-dev] Reordering of load/stores using MemorySSA [Oct 2018]

If this information is useful, please help other people find it:
Share via:

Juneyoung Lee via llvm-dev

2018-Oct-02 04:03 UTC

[llvm-dev] Reordering of load/stores using MemorySSA

Hello all,

It seems that it is insufficient to
rely on MemorySSA to correctly reorder load/stores.

For example, we have this following code:

v = load q
store 10, p

=>

store 10, p
v = load q // This is incorrect if p and q may alias

In the MemorySSA representation however,
there's no syntactic dependency between load/store,
because MemorySSA before transformation would look like this:

USE(M0) // corresponding to `v = load q`
M1 = DEF(M0) // corresponding to `store 10, p`

Reordering these two nodes does not break syntactic dependency,
hence additional flow-sensitive analysis is needed to check whether
reordering is valid.
To my understanding, GVNHoist seems to do that:
Memory operations in basic blocks are checked by
GVNHoist::hasMemoryUse().


If USE nodes of MemorySSA also have left-hand side, we can correctly
represent this kind of reorderability IMHO.

For the example above:

U1 = USE(M0) // corresponding to `v = load q`
M1 = DEF(M0, U1) // corresponding to `store 10, p`

Now there is a syntactic dependency between M1 and U1, hence
we can syntactically check whether reordering is allowed without any
flow-sensitive analysis.
If the store and load don't alias, we can remove U1 from DEF:

U1 = USE(M0) // corresponding to `v = load q`
M1 = DEF(M0) // corresponding to `store 10, p`

Then, the reordering is valid.

One issue is regarding PHI node: possible solution is to make
two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are
at most 2 PHIs.


Does this idea seem to make sense? Any thoughts?

Best Regards,
Juneyoung Lee

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181002/e330a088/attachment.html>

Daniel Berlin via llvm-dev

2018-Oct-02 04:38 UTC

head link

[llvm-dev] Reordering of load/stores using MemorySSA

On Mon, Oct 1, 2018 at 9:03 PM Juneyoung Lee <juneyoung.lee at
sf.snu.ac.kr> wrote:>
>
> Hello all,
>
> It seems that it is insufficient to
> rely on MemorySSA to correctly reorder load/stores.
It depends on what you are trying to do.
>
> For example, we have this following code:
>
> v = load q
> store 10, p
>
> =>
>
> store 10, p
> v = load q // This is incorrect if p and q may alias
>
> In the MemorySSA representation however,
> there's no syntactic dependency between load/store,
> because MemorySSA before transformation would look like this:
>
> USE(M0) // corresponding to `v = load q`
> M1 = DEF(M0) // corresponding to `store 10, p`
This is correct, MemorySSA, like SSA, represents upwards dependencies
and not downwards ones.

Much like there is SSU and SSI, and SSA/SSU/SSI on the reverse flow
graph, there are analogues for MemorySSA that we do not implement.
>
> Reordering these two nodes does not break syntactic dependency,This is true in regular SSA as well :)
It just happens that memory is different due to aliasing.
> hence additional flow-sensitive analysis is needed to check whether
reordering is valid.
> To my understanding, GVNHoist seems to do that:
> Memory operations in basic blocks are checked by
> GVNHoist::hasMemoryUse().
>Correct.
>
> If USE nodes of MemorySSA also have left-hand side, we can correctly
> represent this kind of reorderability IMHO.
>
> For the example above:
>
> U1 = USE(M0) // corresponding to `v = load q`
> M1 = DEF(M0, U1) // corresponding to `store 10, p`
It is not this simple.
You also need to avoid multiple reaching uses for a single def.
IE
U0 = Use(M0) // corresponding to `v = load q`
U1 = Use(M1) // corresponding to `t = load r` where r aliases q aliases p
M1 = DEF(M0, ???)

??? must be U0, U1 above.

If you have the analogue of phis, you will require that analogue to
merge the multiple reaching uses, and you may have multiple merges per
block.
IE
U2 = merge(U0, U1)
M1 = DEF (M0, U2)
>
> Now there is a syntactic dependency between M1 and U1, hence
> we can syntactically check whether reordering is allowed without any
flow-sensitive analysis.
What's the goal?
Much like SSA is meant to speed up certain dataflow problems, so is
this. Something would have to be slow to make this worth it.

MemorySSA in particular is deliberately an overapproximation in LLVM
(and GCC), because we discovered exactness is super-expensive,
impossible, and going as far as you can buys you almost nothing in
practice.
However, it makes for a much faster memorydependenceanalysis (which in
llvm is also only upwards and not downwards).

Here, it's not clear that GVNHoist is slow enough to make building an
entire ReverseMemorySSA form worth it.
In particular, building reverse MemorySSA will double the number of
alias checks in most cases.
You would almost certainly have to fix the BasicAA caching issues and
finally split BasicAA or something to make it viable.
(MemorySSA already pays a high price for BasicAA being super slow).
> If the store and load don't alias, we can remove U1 from DEF:
>
> U1 = USE(M0) // corresponding to `v = load q`
> M1 = DEF(M0) // corresponding to `store 10, p`
>
> Then, the reordering is valid.
>
> One issue is regarding PHI node: possible solution is to make
> two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are
> at most 2 PHIs.
>
>
> Does this idea seem to make sense? Any thoughts?
>It's certainly correct, i have considered doing it before but never
had something to test out whether it was really worth the cost or not.

One previous blocker was that it requires a correct postdomtree to be
able to do placement of use phis.
That should now be solved.

I suspect you will find the cost is not worth it without a meaningful
motivating optimization (like store PRE or something)

Juneyoung Lee via llvm-dev

2018-Oct-02 11:44 UTC

head link

[llvm-dev] Reordering of load/stores using MemorySSA

Hello Daniel,

Thank you for the reply.
My impression was that embedding such information into MemorySSA
would be helpful for shorter compilation time
as well as people who needs this kind of information for writing
optimizations,
but building time of MemorySSA also should have been considered.
> I suspect you will find the cost is not worth it without a meaningful
> motivating optimization (like store PRE or something)
I think implementing store PRE for very simple cases seems interesting
(not only for MemorySSA, but the optimization itself as well).
I can give it a try to implement it if it is
straightforward.>From which file should I start?
By the way, while looking through memory related optimizations, I found
this patch -
https://reviews.llvm.org/D5422 .
Would it make sense if store PRE deals with atomic operations
as well? I wonder whether people are very interested in
optimizing atomic operations / there are more patches like this.

Thank you,
Juneyoung Lee

On Tue, Oct 2, 2018 at 1:38 PM Daniel Berlin <dberlin at dberlin.org>
wrote:
> On Mon, Oct 1, 2018 at 9:03 PM Juneyoung Lee <juneyoung.lee at
sf.snu.ac.kr>
> wrote:
> >
> >
> > Hello all,
> >
> > It seems that it is insufficient to
> > rely on MemorySSA to correctly reorder load/stores.
>
> It depends on what you are trying to do.
>
> >
> > For example, we have this following code:
> >
> > v = load q
> > store 10, p
> >
> > =>
> >
> > store 10, p
> > v = load q // This is incorrect if p and q may alias
> >
> > In the MemorySSA representation however,
> > there's no syntactic dependency between load/store,
> > because MemorySSA before transformation would look like this:
> >
> > USE(M0) // corresponding to `v = load q`
> > M1 = DEF(M0) // corresponding to `store 10, p`
>
> This is correct, MemorySSA, like SSA, represents upwards dependencies
> and not downwards ones.
>
> Much like there is SSU and SSI, and SSA/SSU/SSI on the reverse flow
> graph, there are analogues for MemorySSA that we do not implement.
>
> >
> > Reordering these two nodes does not break syntactic dependency,
> This is true in regular SSA as well :)
> It just happens that memory is different due to aliasing.
>
> > hence additional flow-sensitive analysis is needed to check whether
> reordering is valid.
>
> > To my understanding, GVNHoist seems to do that:
> > Memory operations in basic blocks are checked by
> > GVNHoist::hasMemoryUse().
> >
> Correct.
>
> >
> > If USE nodes of MemorySSA also have left-hand side, we can correctly
> > represent this kind of reorderability IMHO.
> >
> > For the example above:
> >
> > U1 = USE(M0) // corresponding to `v = load q`
> > M1 = DEF(M0, U1) // corresponding to `store 10, p`
>
> It is not this simple.
> You also need to avoid multiple reaching uses for a single def.
> IE
> U0 = Use(M0) // corresponding to `v = load q`
> U1 = Use(M1) // corresponding to `t = load r` where r aliases q aliases p
> M1 = DEF(M0, ???)
>
> ??? must be U0, U1 above.
>
> If you have the analogue of phis, you will require that analogue to
> merge the multiple reaching uses, and you may have multiple merges per
> block.
> IE
> U2 = merge(U0, U1)
> M1 = DEF (M0, U2)
>
> >
> > Now there is a syntactic dependency between M1 and U1, hence
> > we can syntactically check whether reordering is allowed without any
> flow-sensitive analysis.
>
> What's the goal?
> Much like SSA is meant to speed up certain dataflow problems, so is
> this. Something would have to be slow to make this worth it.
>
> MemorySSA in particular is deliberately an overapproximation in LLVM
> (and GCC), because we discovered exactness is super-expensive,
> impossible, and going as far as you can buys you almost nothing in
> practice.
> However, it makes for a much faster memorydependenceanalysis (which in
> llvm is also only upwards and not downwards).
>
> Here, it's not clear that GVNHoist is slow enough to make building an
> entire ReverseMemorySSA form worth it.
> In particular, building reverse MemorySSA will double the number of
> alias checks in most cases.
> You would almost certainly have to fix the BasicAA caching issues and
> finally split BasicAA or something to make it viable.
> (MemorySSA already pays a high price for BasicAA being super slow).
>
> > If the store and load don't alias, we can remove U1 from DEF:
> >
> > U1 = USE(M0) // corresponding to `v = load q`
> > M1 = DEF(M0) // corresponding to `store 10, p`
> >
> > Then, the reordering is valid.
> >
> > One issue is regarding PHI node: possible solution is to make
> > two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are
> > at most 2 PHIs.
> >
> >
> > Does this idea seem to make sense? Any thoughts?
> >
> It's certainly correct, i have considered doing it before but never
> had something to test out whether it was really worth the cost or not.
>
> One previous blocker was that it requires a correct postdomtree to be
> able to do placement of use phis.
> That should now be solved.
>
> I suspect you will find the cost is not worth it without a meaningful
> motivating optimization (like store PRE or something)
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181002/2dc3ccdf/attachment.html>

llvm dev - Oct 2018 - Reordering of load/stores using MemorySSA

[llvm-dev] Reordering of load/stores using MemorySSA

[llvm-dev] Reordering of load/stores using MemorySSA

[llvm-dev] Reordering of load/stores using MemorySSA