thr3ads.net - llvm dev - [llvm-dev] [RFC] Register Rematerialization (remat) Extension [Sep 2016]

If this information is useful, please help other people find it:
Share via:

vivek pandya via llvm-dev

2016-Sep-12 15:51 UTC

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

Hello Developers,

I am working with my other batchmates to improve register remat in LLVM.
We want to remat live ranges made of multiple instruction.

Just to support our proposal here is a simple example that currently remat
does
not cover

$ cat ~/tmp/tl.c
void foo(long);
void bar() {
  for (int i = 0; i < 1600; ++i)
    foo(3494348345984503943);
}

$ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
...
# BB#0:                                 # %entry
...
        lis 3, 12414
        ori 3, 3, 27470
        sldi 3, 3, 32
        oris 3, 3, 35809
        ori 30, 3, 20615
...
.LBB0_1:                                # %for.body
        mr 3, 30
        bl foo
...

There is a sequence of instructions used to materialize the constant, the
first
one (the lis) is trivially rematerialiable, and the others depend only on
that one,
and have no side effects. If we otherwise needed to spill the constant, we
might
wish to move the entire set of instructions that compute the value into the
loop body.
(Many thanks to Hal Finkel for this example and head start)

We are following very old but effective paper "Rematerialization"
http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]

This extension will specially improve code quality for RICS backends like
powerpc, MIPS, ARM, AArch64 etc.

Here is a tentative apporach ( after reading the above mentioned paper and
current remat code) that I would like to follow.

Please share your views because this may be totally wrong direction. Also I
will
be happy if this gets into main line LLVM code but if community don't want
to make remat heavy than please guide me for my class project perspective.

1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
think it does not require to build SSA graph and converting it back after
optimization completed as mentioned in [1]

2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
Constant
Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass
will be scheduled to run before register allocation.

3 ) Output of the pass added in Step 2 will be a Map of def to instructions
pointers (instructions which can be used to remat the given live range).
The map will contain live ranges which is due to single instruction and
multiple instructions.

4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
Map
when a spill is required for RA.

5 ) The remat transformation APIs like rematerializeAt() will be teached to
remat
live ranges with multiple instructions too.

6 ) A cost analysis will be require to decide between remat and spill. This
should be based on at least two factors register pressure and spill cost

Few points:
--------------
* The analysis pass to be addes as per (2) will use target specific
information
from TargetInstrInfo.cpp as the current remat infrastructure uses.

* This approach will not be on demand as the current approach is (i.e remat
specific
code will be executed only if there is a spill) so the pass in (2) can be an
overhead so we may want it to enable only for higher level of optimization.

* Will it be possible to use existing SCCP.cpp code with few modification
to lattice
and related mathematical operation so that it can serve both purpose?

* No changes in current register allocators or spill framework will be
required
because remat entry point will be LiveRangeEdit.

Any other way with less overhead is always welcomed.
Please help us developing a plan to implement this.

Hoping for comments!

Sincerely,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/215d72cf/attachment.html>

Andrew Trick via llvm-dev

2016-Sep-12 17:14 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

> On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
think it does not require to build SSA graph and converting it back after
optimization completed as mentioned in [1]
> 
> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
Constant
> Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.cfm?id=103136
<http://dl.acm.org/citation.cfm?id=103136>) as desribed in [1]. This pass
will be scheduled to run before register allocation.
> 
> 3 ) Output of the pass added in Step 2 will be a Map of def to instructions
pointers (instructions which can be used to remat the given live range). The map
will contain live ranges which is due to single instruction and multiple
instructions.
LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow efficient
def-use queries, but use-def is there, which is all that you should need.

It would be great to have better remat during regalloc, but please try to avoid
building additional state that needs to be maintained.

-Andy
> 
> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
Map
> when a spill is required for RA.
> 
> 5 ) The remat transformation APIs like rematerializeAt() will be teached to
remat
> live ranges with multiple instructions too.
> 
> 6 ) A cost analysis will be require to decide between remat and spill. This
should be based on at least two factors register pressure and spill cost
> 
> Few points:
> --------------
> * The analysis pass to be addes as per (2) will use target specific
information
> from TargetInstrInfo.cpp as the current remat infrastructure uses.
> 
> * This approach will not be on demand as the current approach is (i.e remat
specific
> code will be executed only if there is a spill) so the pass in (2) can be
an
> overhead so we may want it to enable only for higher level of optimization.
> 
> * Will it be possible to use existing SCCP.cpp code with few modification
to lattice
> and related mathematical operation so that it can serve both purpose?
> 
> * No changes in current register allocators or spill framework will be
required
> because remat entry point will be LiveRangeEdit.
> 
> Any other way with less overhead is always welcomed.
> Please help us developing a plan to implement this.
> 
> Hoping for comments!
> 
> Sincerely,
> Vivek
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/e2228945/attachment.html>

Philip Reames via llvm-dev

2016-Sep-13 14:50 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

On 09/12/2016 08:51 AM, vivek pandya via llvm-dev wrote:> Hello Developers,
>
> I am working with my other batchmates to improve register remat in LLVM.
> We want to remat live ranges made of multiple instruction.
>
> Just to support our proposal here is a simple example that currently 
> remat does
> not cover
>
> $ cat ~/tmp/tl.c
> void foo(long);
> void bar() {
>   for (int i = 0; i < 1600; ++i)
>     foo(3494348345984503943);
> }
>
> $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> ...
> # BB#0:                                 # %entry
> ...
>         lis 3, 12414
>         ori 3, 3, 27470
>         sldi 3, 3, 32
>         oris 3, 3, 35809
>         ori 30, 3, 20615
> ...
> .LBB0_1:                                # %for.body
>         mr 3, 30
>         bl foo
> ...OT: Instead of viewing this as a *single* remat problem, can you view 
this as a series of remat problems?  If each instruction can be moved 
while only increasing the live range of a single register and shrinking 
the live range of another, then moving each instruction one by one and 
iterating gives the same effect.  Note that this only (obviously) works 
for single use defs.  Extending it to multiple uses would require a more 
complicated cost model as you point out.>
> There is a sequence of instructions used to materialize the constant, 
> the first
> one (the lis) is trivially rematerialiable, and the others depend only 
> on that one,
> and have no side effects. If we otherwise needed to spill the 
> constant, we might
> wish to move the entire set of instructions that compute the value 
> into the loop body.
> (Many thanks to Hal Finkel for this example and head start)
>
> We are following very old but effective paper "Rematerialization"
> http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
>
> This extension will specially improve code quality for RICS backends like
> powerpc, MIPS, ARM, AArch64 etc.
>
> Here is a tentative apporach ( after reading the above mentioned paper 
> and current remat code) that I would like to follow.
>
> Please share your views because this may be totally wrong direction. 
> Also I will
> be happy if this gets into main line LLVM code but if community don't
want
> to make remat heavy than please guide me for my class project perspective.
>
> 1 ) As LLVM MI is already in SSA form before reg allocation so for 
> LLVM I think it does not require to build SSA graph and converting it 
> back after optimization completed as mentioned in [1]
>
> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse 
> Conditional Constant
> Propagation based on Wegman and Zadeck's work 
> http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This 
> pass will be scheduled to run before register allocation.
>
> 3 ) Output of the pass added in Step 2 will be a Map of def to 
> instructions pointers (instructions which can be used to remat the 
> given live range). The map will contain live ranges which is due to 
> single instruction and multiple instructions.This sounds overly complex.  Can you implement this without needing the 
new side structure?  Maintaining extra state and keeping it up to date 
is expensive.  (From a maintenance and code complexity
perspective.)>
> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from 
> the Map
> when a spill is required for RA.
>
> 5 ) The remat transformation APIs like rematerializeAt() will be 
> teached to remat
> live ranges with multiple instructions too.
>
> 6 ) A cost analysis will be require to decide between remat and spill. 
> This should be based on at least two factors register pressure and 
> spill cost
>
> Few points:
> --------------
> * The analysis pass to be addes as per (2) will use target specific 
> information
> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>
> * This approach will not be on demand as the current approach is (i.e 
> remat specific
> code will be executed only if there is a spill) so the pass in (2) can 
> be an
> overhead so we may want it to enable only for higher level of 
> optimization.
This would be unfortunate.  Not fatal, just unfortunate.>
> * Will it be possible to use existing SCCP.cpp code with few 
> modification to lattice
> and related mathematical operation so that it can serve both purpose?
>
> * No changes in current register allocators or spill framework will be 
> required
> because remat entry point will be LiveRangeEdit.
>
> Any other way with less overhead is always welcomed.
> Please help us developing a plan to implement this.
>
> Hoping for comments!
>
> Sincerely,
> Vivek
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/92f9c0bb/attachment.html>

vivek pandya via llvm-dev

2016-Sep-13 16:03 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

On Tue, Sep 13, 2016 at 8:20 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
>
> On 09/12/2016 08:51 AM, vivek pandya via llvm-dev wrote:
>
> Hello Developers,
>
> I am working with my other batchmates to improve register remat in LLVM.
> We want to remat live ranges made of multiple instruction.
>
> Just to support our proposal here is a simple example that currently remat
> does
> not cover
>
> $ cat ~/tmp/tl.c
> void foo(long);
> void bar() {
>   for (int i = 0; i < 1600; ++i)
>     foo(3494348345984503943);
> }
>
> $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> ...
> # BB#0:                                 # %entry
> ...
>         lis 3, 12414
>         ori 3, 3, 27470
>         sldi 3, 3, 32
>         oris 3, 3, 35809
>         ori 30, 3, 20615
> ...
> .LBB0_1:                                # %for.body
>         mr 3, 30
>         bl foo
> ...
>
> OT: Instead of viewing this as a *single* remat problem, can you view this
> as a series of remat problems?  If each instruction can be moved while only
> increasing the live range of a single register and shrinking the live range
> of another, then moving each instruction one by one and iterating gives the
> same effect.  Note that this only (obviously) works for single use defs.
> Extending it to multiple uses would require a more complicated cost model
> as you point out.
>
Hello Philip,

Your idea seems to be effective and can be tried out with few changes. But
correct me if I got this wrong, by multiple uses do you mean a same remat
sequence being remat again? In that case we can run the analysis again to
decide which all instructions are required to be remat or we may add a
simple caching layer which can hold analysis result for once compilation
unit.
Any thoughts ?
>
>
>
> There is a sequence of instructions used to materialize the constant, the
> first
> one (the lis) is trivially rematerialiable, and the others depend only on
> that one,
> and have no side effects. If we otherwise needed to spill the constant, we
> might
> wish to move the entire set of instructions that compute the value into
> the loop body.
> (Many thanks to Hal Finkel for this example and head start)
>
> We are following very old but effective paper "Rematerialization"
> http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
>
> This extension will specially improve code quality for RICS backends like
> powerpc, MIPS, ARM, AArch64 etc.
>
> Here is a tentative apporach ( after reading the above mentioned paper and
> current remat code) that I would like to follow.
>
> Please share your views because this may be totally wrong direction. Also
> I will
> be happy if this gets into main line LLVM code but if community don't
want
> to make remat heavy than please guide me for my class project perspective.
>
> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
> think it does not require to build SSA graph and converting it back after
> optimization completed as mentioned in [1]
>
> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
> Constant
> Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.
> cfm?id=103136) as desribed in [1]. This pass will be scheduled to run
> before register allocation.
>
> 3 ) Output of the pass added in Step 2 will be a Map of def to
> instructions pointers (instructions which can be used to remat the given
> live range). The map will contain live ranges which is due to single
> instruction and multiple instructions.
>
> This sounds overly complex.  Can you implement this without needing the
> new side structure?  Maintaining extra state and keeping it up to date is
> expensive.  (From a maintenance and code complexity perspective.)
>
>
> Currently I don't think we have any facility to annotate instructions
soto keep analysis available till register allocation we need an immutable
pass.

> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
> Map
> when a spill is required for RA.
>
> 5 ) The remat transformation APIs like rematerializeAt() will be teached
> to remat
> live ranges with multiple instructions too.
>
> 6 ) A cost analysis will be require to decide between remat and spill.
> This should be based on at least two factors register pressure and spill
> cost
>
> Few points:
> --------------
> * The analysis pass to be addes as per (2) will use target specific
> information
> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>
> * This approach will not be on demand as the current approach is (i.e
> remat specific
> code will be executed only if there is a spill) so the pass in (2) can be
> an
> overhead so we may want it to enable only for higher level of optimization.
>
> This would be unfortunate.  Not fatal, just unfortunate.
>
>
> -Vivek
> * Will it be possible to use existing SCCP.cpp code with few modification
> to lattice
> and related mathematical operation so that it can serve both purpose?
>
> * No changes in current register allocators or spill framework will be
> required
> because remat entry point will be LiveRangeEdit.
>
> Any other way with less overhead is always welcomed.
> Please help us developing a plan to implement this.
>
> Hoping for comments!
>
> Sincerely,
> Vivek
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/6f6a6105/attachment.html>

Gerolf Hoflehner via llvm-dev

2016-Sep-14 01:42 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

> On Sep 12, 2016, at 10:14 AM, Andrew Trick via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
>> On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> 
>> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM
I think it does not require to build SSA graph and converting it back after
optimization completed as mentioned in [1]
>> 
>> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
Constant
>> Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.cfm?id=103136
<http://dl.acm.org/citation.cfm?id=103136>) as desribed in [1]. This pass
will be scheduled to run before register allocation.
>> 
>> 3 ) Output of the pass added in Step 2 will be a Map of def to
instructions pointers (instructions which can be used to remat the given live
range). The map will contain live ranges which is due to single instruction and
multiple instructions.
> 
> LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow
efficient def-use queries, but use-def is there, which is all that you should
need.
I also only see a narrow and specific remat cost problem in the example: is it
cheaper is issue a  chain of instructions rather than a fill? And for this a
use-def is sufficient.> 
> It would be great to have better remat during regalloc, but please try to
avoid building additional state that needs to be maintained.
You proposed a fairly complex scheme. The question then is always is it worth
it? To answer that question you would need to investigate and break down the
current remat problems (spills but should remat, remat but should spill, should
remat at a different location, etc) eg. for the llvm test suite, and show that
your algorithms solves the most important ones.

> 
> -Andy
> 
>> 
>> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from
the Map
>> when a spill is required for RA.
>> 
>> 5 ) The remat transformation APIs like rematerializeAt() will be
teached to remat
>> live ranges with multiple instructions too.
>> 
>> 6 ) A cost analysis will be require to decide between remat and spill.
This should be based on at least two factors register pressure and spill cost
>> 
>> Few points:
>> --------------
>> * The analysis pass to be addes as per (2) will use target specific
information
>> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>> 
>> * This approach will not be on demand as the current approach is (i.e
remat specific
>> code will be executed only if there is a spill) so the pass in (2) can
be an
>> overhead so we may want it to enable only for higher level of
optimization.
>> 
>> * Will it be possible to use existing SCCP.cpp code with few
modification to lattice
>> and related mathematical operation so that it can serve both purpose?
>> 
>> * No changes in current register allocators or spill framework will be
required
>> because remat entry point will be LiveRangeEdit.
>> 
>> Any other way with less overhead is always welcomed.
>> Please help us developing a plan to implement this.
>> 
>> Hoping for comments!
>> 
>> Sincerely,
>> Vivek
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/5d1b1af6/attachment.html>

Bruce Hoult via llvm-dev

2016-Sep-19 14:10 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

The idea seems sound, but do you really have a CPU in which such a complex
rematerialization is better than an L1 cache load from the stack frame?

        lis 3, 12414
        ori 3, 3, 27470
        sldi 3, 3, 32
        oris 3, 3, 35809
        ori 30, 3, 20615

I'm not familiar with modern PPC64 but seems like a lose on PPC G5 and from
the docs I quickly found (2 cycle latency on dependent int ALU ops) Power8
too.

OK, maybe (if I didn't screw up the rldimi):

        lis 3, 12414
        lis 30, 35809
        ori 3, 3, 27470
        ori 30, 30, 20615
        rldimi 30, 3, 32, 0

Or is there something that optimizes such sequences building constants?

On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello Developers,
>
> I am working with my other batchmates to improve register remat in LLVM.
> We want to remat live ranges made of multiple instruction.
>
> Just to support our proposal here is a simple example that currently remat
> does
> not cover
>
> $ cat ~/tmp/tl.c
> void foo(long);
> void bar() {
>   for (int i = 0; i < 1600; ++i)
>     foo(3494348345984503943);
> }
>
> $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> ...
> # BB#0:                                 # %entry
> ...
>         lis 3, 12414
>         ori 3, 3, 27470
>         sldi 3, 3, 32
>         oris 3, 3, 35809
>         ori 30, 3, 20615
> ...
> .LBB0_1:                                # %for.body
>         mr 3, 30
>         bl foo
> ...
>
> There is a sequence of instructions used to materialize the constant, the
> first
> one (the lis) is trivially rematerialiable, and the others depend only on
> that one,
> and have no side effects. If we otherwise needed to spill the constant, we
> might
> wish to move the entire set of instructions that compute the value into
> the loop body.
> (Many thanks to Hal Finkel for this example and head start)
>
> We are following very old but effective paper "Rematerialization"
> http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
>
> This extension will specially improve code quality for RICS backends like
> powerpc, MIPS, ARM, AArch64 etc.
>
> Here is a tentative apporach ( after reading the above mentioned paper and
> current remat code) that I would like to follow.
>
> Please share your views because this may be totally wrong direction. Also
> I will
> be happy if this gets into main line LLVM code but if community don't
want
> to make remat heavy than please guide me for my class project perspective.
>
> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
> think it does not require to build SSA graph and converting it back after
> optimization completed as mentioned in [1]
>
> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
> Constant
> Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.
> cfm?id=103136) as desribed in [1]. This pass will be scheduled to run
> before register allocation.
>
> 3 ) Output of the pass added in Step 2 will be a Map of def to
> instructions pointers (instructions which can be used to remat the given
> live range). The map will contain live ranges which is due to single
> instruction and multiple instructions.
>
> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
> Map
> when a spill is required for RA.
>
> 5 ) The remat transformation APIs like rematerializeAt() will be teached
> to remat
> live ranges with multiple instructions too.
>
> 6 ) A cost analysis will be require to decide between remat and spill.
> This should be based on at least two factors register pressure and spill
> cost
>
> Few points:
> --------------
> * The analysis pass to be addes as per (2) will use target specific
> information
> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>
> * This approach will not be on demand as the current approach is (i.e
> remat specific
> code will be executed only if there is a spill) so the pass in (2) can be
> an
> overhead so we may want it to enable only for higher level of optimization.
>
> * Will it be possible to use existing SCCP.cpp code with few modification
> to lattice
> and related mathematical operation so that it can serve both purpose?
>
> * No changes in current register allocators or spill framework will be
> required
> because remat entry point will be LiveRangeEdit.
>
> Any other way with less overhead is always welcomed.
> Please help us developing a plan to implement this.
>
> Hoping for comments!
>
> Sincerely,
> Vivek
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160919/136b0901/attachment.html>

Hal Finkel via llvm-dev

2016-Sep-24 17:39 UTC

head link

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

----- Original Message -----
> From: "Bruce Hoult" <bruce at hoult.org>
> To: "vivek pandya" <vivekvpandya at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Hal
Finkel"
> <hfinkel at anl.gov>, "Matthias Braun" <matze at
braunis.de>
> Sent: Monday, September 19, 2016 9:10:17 AM
> Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat)
> Extension
> The idea seems sound, but do you really have a CPU in which such a
> complex rematerialization is better than an L1 cache load from the
> stack frame?
> lis 3, 12414
> ori 3, 3, 27470
> sldi 3, 3, 32
> oris 3, 3, 35809
> ori 30, 3, 20615
> I'm not familiar with modern PPC64 but seems like a lose on PPC G5
> and from the docs I quickly found (2 cycle latency on dependent int
> ALU ops) Power8 too.
> OK, maybe (if I didn't screw up the rldimi):
> lis 3, 12414
> lis 30, 35809
> ori 3, 3, 27470
> ori 30, 30, 20615
> rldimi 30, 3, 32, 0
> Or is there something that optimizes such sequences building
> constants?I don't know about the G5, but I did some experiments on the P8 and I was
unable to distinguish the performance of a load vs. the materialization
sequence. This matches my expectations: The load should have a load-to-use
latency of 3 cycles. The materialization-sequence instructions can issue
together, each instruction has only a 1-cycle latency with forwarding (and 2 can
execute per cycle), and the height of the dependency chain is 3 instructions.
All other things being roughly equal, keeping pressure off of the memory
subsystem tends to trump other concerns.

-Hal 
> On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > Hello Developers,
> 
> > I am working with my other batchmates to improve register remat in
> > LLVM.
> 
> > We want to remat live ranges made of multiple instruction.
> 
> > Just to support our proposal here is a simple example that
> > currently
> > remat does
> 
> > not cover
> 
> > $ cat ~/tmp/tl.c
> 
> > void foo(long);
> 
> > void bar() {
> 
> > for (int i = 0; i < 1600; ++i)
> 
> > foo(3494348345984503943);
> 
> > }
> 
> > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> 
> > ...
> 
> > # BB#0: # %entry
> 
> > ...
> 
> > lis 3, 12414
> 
> > ori 3, 3, 27470
> 
> > sldi 3, 3, 32
> 
> > oris 3, 3, 35809
> 
> > ori 30, 3, 20615
> 
> > ...
> 
> > .LBB0_1: # %for.body
> 
> > mr 3, 30
> 
> > bl foo
> 
> > ...
> 
> > There is a sequence of instructions used to materialize the
> > constant,
> > the first
> 
> > one (the lis) is trivially rematerialiable, and the others depend
> > only on that one,
> 
> > and have no side effects. If we otherwise needed to spill the
> > constant, we might
> 
> > wish to move the entire set of instructions that compute the value
> > into the loop body.
> 
> > (Many thanks to Hal Finkel for this example and head start)
> 
> > We are following very old but effective paper
"Rematerialization"
> 
> > http://dl.acm.org/citation.cfm?id=143143
> > ------------------------------[1]
> 
> > This extension will specially improve code quality for RICS
> > backends
> > like
> 
> > powerpc, MIPS, ARM, AArch64 etc.
> 
> > Here is a tentative apporach ( after reading the above mentioned
> > paper and current remat code) that I would like to follow.
> 
> > Please share your views because this may be totally wrong
> > direction.
> > Also I will
> 
> > be happy if this gets into main line LLVM code but if community
> > don't
> > want
> 
> > to make remat heavy than please guide me for my class project
> > perspective.
> 
> > 1 ) As LLVM MI is already in SSA form before reg allocation so for
> > LLVM I think it does not require to build SSA graph and converting
> > it back after optimization completed as mentioned in [1]
> 
> > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse
> > Conditional Constant
> 
> > Propagation based on Wegman and Zadeck's work
> > http://dl.acm.org/citation.cfm?id=103136 ) as desribed in [1]. This
> > pass will be scheduled to run before register allocation.
> 
> > 3 ) Output of the pass added in Step 2 will be a Map of def to
> > instructions pointers (instructions which can be used to remat the
> > given live range). The map will contain live ranges which is due to
> > single instruction and multiple instructions.
> 
> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis
> > from the Map
> 
> > when a spill is required for RA.
> 
> > 5 ) The remat transformation APIs like rematerializeAt() will be
> > teached to remat
> 
> > live ranges with multiple instructions too.
> 
> > 6 ) A cost analysis will be require to decide between remat and
> > spill. This should be based on at least two factors register
> > pressure and spill cost
> 
> > Few points:
> 
> > --------------
> 
> > * The analysis pass to be addes as per (2) will use target specific
> > information
> 
> > from TargetInstrInfo.cpp as the current remat infrastructure uses.
> 
> > * This approach will not be on demand as the current approach is
> > (i.e
> > remat specific
> 
> > code will be executed only if there is a spill) so the pass in (2)
> > can be an
> 
> > overhead so we may want it to enable only for higher level of
> > optimization.
> 
> > * Will it be possible to use existing SCCP.cpp code with few
> > modification to lattice
> 
> > and related mathematical operation so that it can serve both
> > purpose?
> 
> > * No changes in current register allocators or spill framework will
> > be required
> 
> > because remat entry point will be LiveRangeEdit.
> 
> > Any other way with less overhead is always welcomed.
> 
> > Please help us developing a plan to implement this.
> 
> > Hoping for comments!
> 
> > Sincerely,
> 
> > Vivek
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160924/9fcb559e/attachment.html>

llvm dev - Sep 2016 - [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension

[llvm-dev] [RFC] Register Rematerialization (remat) Extension