vivek pandya via llvm-dev
2016-Sep-12 15:51 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
Hello Developers, I am working with my other batchmates to improve register remat in LLVM. We want to remat live ranges made of multiple instruction. Just to support our proposal here is a simple example that currently remat does not cover $ cat ~/tmp/tl.c void foo(long); void bar() { for (int i = 0; i < 1600; ++i) foo(3494348345984503943); } $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64 ... # BB#0: # %entry ... lis 3, 12414 ori 3, 3, 27470 sldi 3, 3, 32 oris 3, 3, 35809 ori 30, 3, 20615 ... .LBB0_1: # %for.body mr 3, 30 bl foo ... There is a sequence of instructions used to materialize the constant, the first one (the lis) is trivially rematerialiable, and the others depend only on that one, and have no side effects. If we otherwise needed to spill the constant, we might wish to move the entire set of instructions that compute the value into the loop body. (Many thanks to Hal Finkel for this example and head start) We are following very old but effective paper "Rematerialization" http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1] This extension will specially improve code quality for RICS backends like powerpc, MIPS, ARM, AArch64 etc. Here is a tentative apporach ( after reading the above mentioned paper and current remat code) that I would like to follow. Please share your views because this may be totally wrong direction. Also I will be happy if this gets into main line LLVM code but if community don't want to make remat heavy than please guide me for my class project perspective. 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1] 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional Constant Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled to run before register allocation. 3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions. 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Map when a spill is required for RA. 5 ) The remat transformation APIs like rematerializeAt() will be teached to remat live ranges with multiple instructions too. 6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill cost Few points: -------------- * The analysis pass to be addes as per (2) will use target specific information from TargetInstrInfo.cpp as the current remat infrastructure uses. * This approach will not be on demand as the current approach is (i.e remat specific code will be executed only if there is a spill) so the pass in (2) can be an overhead so we may want it to enable only for higher level of optimization. * Will it be possible to use existing SCCP.cpp code with few modification to lattice and related mathematical operation so that it can serve both purpose? * No changes in current register allocators or spill framework will be required because remat entry point will be LiveRangeEdit. Any other way with less overhead is always welcomed. Please help us developing a plan to implement this. Hoping for comments! Sincerely, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/215d72cf/attachment.html>
Andrew Trick via llvm-dev
2016-Sep-12 17:14 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
> On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1] > > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional Constant > Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136 <http://dl.acm.org/citation.cfm?id=103136>) as desribed in [1]. This pass will be scheduled to run before register allocation. > > 3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions.LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow efficient def-use queries, but use-def is there, which is all that you should need. It would be great to have better remat during regalloc, but please try to avoid building additional state that needs to be maintained. -Andy> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Map > when a spill is required for RA. > > 5 ) The remat transformation APIs like rematerializeAt() will be teached to remat > live ranges with multiple instructions too. > > 6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill cost > > Few points: > -------------- > * The analysis pass to be addes as per (2) will use target specific information > from TargetInstrInfo.cpp as the current remat infrastructure uses. > > * This approach will not be on demand as the current approach is (i.e remat specific > code will be executed only if there is a spill) so the pass in (2) can be an > overhead so we may want it to enable only for higher level of optimization. > > * Will it be possible to use existing SCCP.cpp code with few modification to lattice > and related mathematical operation so that it can serve both purpose? > > * No changes in current register allocators or spill framework will be required > because remat entry point will be LiveRangeEdit. > > Any other way with less overhead is always welcomed. > Please help us developing a plan to implement this. > > Hoping for comments! > > Sincerely, > Vivek > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/e2228945/attachment.html>
Philip Reames via llvm-dev
2016-Sep-13 14:50 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
On 09/12/2016 08:51 AM, vivek pandya via llvm-dev wrote:> Hello Developers, > > I am working with my other batchmates to improve register remat in LLVM. > We want to remat live ranges made of multiple instruction. > > Just to support our proposal here is a simple example that currently > remat does > not cover > > $ cat ~/tmp/tl.c > void foo(long); > void bar() { > for (int i = 0; i < 1600; ++i) > foo(3494348345984503943); > } > > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64 > ... > # BB#0: # %entry > ... > lis 3, 12414 > ori 3, 3, 27470 > sldi 3, 3, 32 > oris 3, 3, 35809 > ori 30, 3, 20615 > ... > .LBB0_1: # %for.body > mr 3, 30 > bl foo > ...OT: Instead of viewing this as a *single* remat problem, can you view this as a series of remat problems? If each instruction can be moved while only increasing the live range of a single register and shrinking the live range of another, then moving each instruction one by one and iterating gives the same effect. Note that this only (obviously) works for single use defs. Extending it to multiple uses would require a more complicated cost model as you point out.> > There is a sequence of instructions used to materialize the constant, > the first > one (the lis) is trivially rematerialiable, and the others depend only > on that one, > and have no side effects. If we otherwise needed to spill the > constant, we might > wish to move the entire set of instructions that compute the value > into the loop body. > (Many thanks to Hal Finkel for this example and head start) > > We are following very old but effective paper "Rematerialization" > http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1] > > This extension will specially improve code quality for RICS backends like > powerpc, MIPS, ARM, AArch64 etc. > > Here is a tentative apporach ( after reading the above mentioned paper > and current remat code) that I would like to follow. > > Please share your views because this may be totally wrong direction. > Also I will > be happy if this gets into main line LLVM code but if community don't want > to make remat heavy than please guide me for my class project perspective. > > 1 ) As LLVM MI is already in SSA form before reg allocation so for > LLVM I think it does not require to build SSA graph and converting it > back after optimization completed as mentioned in [1] > > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse > Conditional Constant > Propagation based on Wegman and Zadeck's work > http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This > pass will be scheduled to run before register allocation. > > 3 ) Output of the pass added in Step 2 will be a Map of def to > instructions pointers (instructions which can be used to remat the > given live range). The map will contain live ranges which is due to > single instruction and multiple instructions.This sounds overly complex. Can you implement this without needing the new side structure? Maintaining extra state and keeping it up to date is expensive. (From a maintenance and code complexity perspective.)> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from > the Map > when a spill is required for RA. > > 5 ) The remat transformation APIs like rematerializeAt() will be > teached to remat > live ranges with multiple instructions too. > > 6 ) A cost analysis will be require to decide between remat and spill. > This should be based on at least two factors register pressure and > spill cost > > Few points: > -------------- > * The analysis pass to be addes as per (2) will use target specific > information > from TargetInstrInfo.cpp as the current remat infrastructure uses. > > * This approach will not be on demand as the current approach is (i.e > remat specific > code will be executed only if there is a spill) so the pass in (2) can > be an > overhead so we may want it to enable only for higher level of > optimization.This would be unfortunate. Not fatal, just unfortunate.> > * Will it be possible to use existing SCCP.cpp code with few > modification to lattice > and related mathematical operation so that it can serve both purpose? > > * No changes in current register allocators or spill framework will be > required > because remat entry point will be LiveRangeEdit. > > Any other way with less overhead is always welcomed. > Please help us developing a plan to implement this. > > Hoping for comments! > > Sincerely, > Vivek > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/92f9c0bb/attachment.html>
vivek pandya via llvm-dev
2016-Sep-13 16:03 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
On Tue, Sep 13, 2016 at 8:20 PM, Philip Reames <listmail at philipreames.com> wrote:> > > On 09/12/2016 08:51 AM, vivek pandya via llvm-dev wrote: > > Hello Developers, > > I am working with my other batchmates to improve register remat in LLVM. > We want to remat live ranges made of multiple instruction. > > Just to support our proposal here is a simple example that currently remat > does > not cover > > $ cat ~/tmp/tl.c > void foo(long); > void bar() { > for (int i = 0; i < 1600; ++i) > foo(3494348345984503943); > } > > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64 > ... > # BB#0: # %entry > ... > lis 3, 12414 > ori 3, 3, 27470 > sldi 3, 3, 32 > oris 3, 3, 35809 > ori 30, 3, 20615 > ... > .LBB0_1: # %for.body > mr 3, 30 > bl foo > ... > > OT: Instead of viewing this as a *single* remat problem, can you view this > as a series of remat problems? If each instruction can be moved while only > increasing the live range of a single register and shrinking the live range > of another, then moving each instruction one by one and iterating gives the > same effect. Note that this only (obviously) works for single use defs. > Extending it to multiple uses would require a more complicated cost model > as you point out. >Hello Philip, Your idea seems to be effective and can be tried out with few changes. But correct me if I got this wrong, by multiple uses do you mean a same remat sequence being remat again? In that case we can run the analysis again to decide which all instructions are required to be remat or we may add a simple caching layer which can hold analysis result for once compilation unit. Any thoughts ?> > > > There is a sequence of instructions used to materialize the constant, the > first > one (the lis) is trivially rematerialiable, and the others depend only on > that one, > and have no side effects. If we otherwise needed to spill the constant, we > might > wish to move the entire set of instructions that compute the value into > the loop body. > (Many thanks to Hal Finkel for this example and head start) > > We are following very old but effective paper "Rematerialization" > http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1] > > This extension will specially improve code quality for RICS backends like > powerpc, MIPS, ARM, AArch64 etc. > > Here is a tentative apporach ( after reading the above mentioned paper and > current remat code) that I would like to follow. > > Please share your views because this may be totally wrong direction. Also > I will > be happy if this gets into main line LLVM code but if community don't want > to make remat heavy than please guide me for my class project perspective. > > 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I > think it does not require to build SSA graph and converting it back after > optimization completed as mentioned in [1] > > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional > Constant > Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation. > cfm?id=103136) as desribed in [1]. This pass will be scheduled to run > before register allocation. > > 3 ) Output of the pass added in Step 2 will be a Map of def to > instructions pointers (instructions which can be used to remat the given > live range). The map will contain live ranges which is due to single > instruction and multiple instructions. > > This sounds overly complex. Can you implement this without needing the > new side structure? Maintaining extra state and keeping it up to date is > expensive. (From a maintenance and code complexity perspective.) > > > Currently I don't think we have any facility to annotate instructions soto keep analysis available till register allocation we need an immutable pass.> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the > Map > when a spill is required for RA. > > 5 ) The remat transformation APIs like rematerializeAt() will be teached > to remat > live ranges with multiple instructions too. > > 6 ) A cost analysis will be require to decide between remat and spill. > This should be based on at least two factors register pressure and spill > cost > > Few points: > -------------- > * The analysis pass to be addes as per (2) will use target specific > information > from TargetInstrInfo.cpp as the current remat infrastructure uses. > > * This approach will not be on demand as the current approach is (i.e > remat specific > code will be executed only if there is a spill) so the pass in (2) can be > an > overhead so we may want it to enable only for higher level of optimization. > > This would be unfortunate. Not fatal, just unfortunate. > > > -Vivek> * Will it be possible to use existing SCCP.cpp code with few modification > to lattice > and related mathematical operation so that it can serve both purpose? > > * No changes in current register allocators or spill framework will be > required > because remat entry point will be LiveRangeEdit. > > Any other way with less overhead is always welcomed. > Please help us developing a plan to implement this. > > Hoping for comments! > > Sincerely, > Vivek > > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/6f6a6105/attachment.html>
Gerolf Hoflehner via llvm-dev
2016-Sep-14 01:42 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
> On Sep 12, 2016, at 10:14 AM, Andrew Trick via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > >> On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I think it does not require to build SSA graph and converting it back after optimization completed as mentioned in [1] >> >> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional Constant >> Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation.cfm?id=103136 <http://dl.acm.org/citation.cfm?id=103136>) as desribed in [1]. This pass will be scheduled to run before register allocation. >> >> 3 ) Output of the pass added in Step 2 will be a Map of def to instructions pointers (instructions which can be used to remat the given live range). The map will contain live ranges which is due to single instruction and multiple instructions. > > LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow efficient def-use queries, but use-def is there, which is all that you should need.I also only see a narrow and specific remat cost problem in the example: is it cheaper is issue a chain of instructions rather than a fill? And for this a use-def is sufficient.> > It would be great to have better remat during regalloc, but please try to avoid building additional state that needs to be maintained.You proposed a fairly complex scheme. The question then is always is it worth it? To answer that question you would need to investigate and break down the current remat problems (spills but should remat, remat but should spill, should remat at a different location, etc) eg. for the llvm test suite, and show that your algorithms solves the most important ones.> > -Andy > >> >> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the Map >> when a spill is required for RA. >> >> 5 ) The remat transformation APIs like rematerializeAt() will be teached to remat >> live ranges with multiple instructions too. >> >> 6 ) A cost analysis will be require to decide between remat and spill. This should be based on at least two factors register pressure and spill cost >> >> Few points: >> -------------- >> * The analysis pass to be addes as per (2) will use target specific information >> from TargetInstrInfo.cpp as the current remat infrastructure uses. >> >> * This approach will not be on demand as the current approach is (i.e remat specific >> code will be executed only if there is a spill) so the pass in (2) can be an >> overhead so we may want it to enable only for higher level of optimization. >> >> * Will it be possible to use existing SCCP.cpp code with few modification to lattice >> and related mathematical operation so that it can serve both purpose? >> >> * No changes in current register allocators or spill framework will be required >> because remat entry point will be LiveRangeEdit. >> >> Any other way with less overhead is always welcomed. >> Please help us developing a plan to implement this. >> >> Hoping for comments! >> >> Sincerely, >> Vivek >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/5d1b1af6/attachment.html>
Bruce Hoult via llvm-dev
2016-Sep-19 14:10 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
The idea seems sound, but do you really have a CPU in which such a complex rematerialization is better than an L1 cache load from the stack frame? lis 3, 12414 ori 3, 3, 27470 sldi 3, 3, 32 oris 3, 3, 35809 ori 30, 3, 20615 I'm not familiar with modern PPC64 but seems like a lose on PPC G5 and from the docs I quickly found (2 cycle latency on dependent int ALU ops) Power8 too. OK, maybe (if I didn't screw up the rldimi): lis 3, 12414 lis 30, 35809 ori 3, 3, 27470 ori 30, 30, 20615 rldimi 30, 3, 32, 0 Or is there something that optimizes such sequences building constants? On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello Developers, > > I am working with my other batchmates to improve register remat in LLVM. > We want to remat live ranges made of multiple instruction. > > Just to support our proposal here is a simple example that currently remat > does > not cover > > $ cat ~/tmp/tl.c > void foo(long); > void bar() { > for (int i = 0; i < 1600; ++i) > foo(3494348345984503943); > } > > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64 > ... > # BB#0: # %entry > ... > lis 3, 12414 > ori 3, 3, 27470 > sldi 3, 3, 32 > oris 3, 3, 35809 > ori 30, 3, 20615 > ... > .LBB0_1: # %for.body > mr 3, 30 > bl foo > ... > > There is a sequence of instructions used to materialize the constant, the > first > one (the lis) is trivially rematerialiable, and the others depend only on > that one, > and have no side effects. If we otherwise needed to spill the constant, we > might > wish to move the entire set of instructions that compute the value into > the loop body. > (Many thanks to Hal Finkel for this example and head start) > > We are following very old but effective paper "Rematerialization" > http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1] > > This extension will specially improve code quality for RICS backends like > powerpc, MIPS, ARM, AArch64 etc. > > Here is a tentative apporach ( after reading the above mentioned paper and > current remat code) that I would like to follow. > > Please share your views because this may be totally wrong direction. Also > I will > be happy if this gets into main line LLVM code but if community don't want > to make remat heavy than please guide me for my class project perspective. > > 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I > think it does not require to build SSA graph and converting it back after > optimization completed as mentioned in [1] > > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional > Constant > Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation. > cfm?id=103136) as desribed in [1]. This pass will be scheduled to run > before register allocation. > > 3 ) Output of the pass added in Step 2 will be a Map of def to > instructions pointers (instructions which can be used to remat the given > live range). The map will contain live ranges which is due to single > instruction and multiple instructions. > > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the > Map > when a spill is required for RA. > > 5 ) The remat transformation APIs like rematerializeAt() will be teached > to remat > live ranges with multiple instructions too. > > 6 ) A cost analysis will be require to decide between remat and spill. > This should be based on at least two factors register pressure and spill > cost > > Few points: > -------------- > * The analysis pass to be addes as per (2) will use target specific > information > from TargetInstrInfo.cpp as the current remat infrastructure uses. > > * This approach will not be on demand as the current approach is (i.e > remat specific > code will be executed only if there is a spill) so the pass in (2) can be > an > overhead so we may want it to enable only for higher level of optimization. > > * Will it be possible to use existing SCCP.cpp code with few modification > to lattice > and related mathematical operation so that it can serve both purpose? > > * No changes in current register allocators or spill framework will be > required > because remat entry point will be LiveRangeEdit. > > Any other way with less overhead is always welcomed. > Please help us developing a plan to implement this. > > Hoping for comments! > > Sincerely, > Vivek > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160919/136b0901/attachment.html>
Hal Finkel via llvm-dev
2016-Sep-24 17:39 UTC
[llvm-dev] [RFC] Register Rematerialization (remat) Extension
----- Original Message -----> From: "Bruce Hoult" <bruce at hoult.org> > To: "vivek pandya" <vivekvpandya at gmail.com> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Hal Finkel" > <hfinkel at anl.gov>, "Matthias Braun" <matze at braunis.de> > Sent: Monday, September 19, 2016 9:10:17 AM > Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat) > Extension> The idea seems sound, but do you really have a CPU in which such a > complex rematerialization is better than an L1 cache load from the > stack frame?> lis 3, 12414 > ori 3, 3, 27470 > sldi 3, 3, 32 > oris 3, 3, 35809 > ori 30, 3, 20615> I'm not familiar with modern PPC64 but seems like a lose on PPC G5 > and from the docs I quickly found (2 cycle latency on dependent int > ALU ops) Power8 too.> OK, maybe (if I didn't screw up the rldimi):> lis 3, 12414 > lis 30, 35809 > ori 3, 3, 27470 > ori 30, 30, 20615> rldimi 30, 3, 32, 0> Or is there something that optimizes such sequences building > constants?I don't know about the G5, but I did some experiments on the P8 and I was unable to distinguish the performance of a load vs. the materialization sequence. This matches my expectations: The load should have a load-to-use latency of 3 cycles. The materialization-sequence instructions can issue together, each instruction has only a 1-cycle latency with forwarding (and 2 can execute per cycle), and the height of the dependency chain is 3 instructions. All other things being roughly equal, keeping pressure off of the memory subsystem tends to trump other concerns. -Hal> On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev < > llvm-dev at lists.llvm.org > wrote:> > Hello Developers, >> > I am working with my other batchmates to improve register remat in > > LLVM. > > > We want to remat live ranges made of multiple instruction. >> > Just to support our proposal here is a simple example that > > currently > > remat does > > > not cover >> > $ cat ~/tmp/tl.c > > > void foo(long); > > > void bar() { > > > for (int i = 0; i < 1600; ++i) > > > foo(3494348345984503943); > > > } >> > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64 > > > ... > > > # BB#0: # %entry > > > ... > > > lis 3, 12414 > > > ori 3, 3, 27470 > > > sldi 3, 3, 32 > > > oris 3, 3, 35809 > > > ori 30, 3, 20615 > > > ... > > > .LBB0_1: # %for.body > > > mr 3, 30 > > > bl foo > > > ... >> > There is a sequence of instructions used to materialize the > > constant, > > the first > > > one (the lis) is trivially rematerialiable, and the others depend > > only on that one, > > > and have no side effects. If we otherwise needed to spill the > > constant, we might > > > wish to move the entire set of instructions that compute the value > > into the loop body. > > > (Many thanks to Hal Finkel for this example and head start) >> > We are following very old but effective paper "Rematerialization" > > > http://dl.acm.org/citation.cfm?id=143143 > > ------------------------------[1] >> > This extension will specially improve code quality for RICS > > backends > > like > > > powerpc, MIPS, ARM, AArch64 etc. >> > Here is a tentative apporach ( after reading the above mentioned > > paper and current remat code) that I would like to follow. >> > Please share your views because this may be totally wrong > > direction. > > Also I will > > > be happy if this gets into main line LLVM code but if community > > don't > > want > > > to make remat heavy than please guide me for my class project > > perspective. >> > 1 ) As LLVM MI is already in SSA form before reg allocation so for > > LLVM I think it does not require to build SSA graph and converting > > it back after optimization completed as mentioned in [1] >> > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse > > Conditional Constant > > > Propagation based on Wegman and Zadeck's work > > http://dl.acm.org/citation.cfm?id=103136 ) as desribed in [1]. This > > pass will be scheduled to run before register allocation. >> > 3 ) Output of the pass added in Step 2 will be a Map of def to > > instructions pointers (instructions which can be used to remat the > > given live range). The map will contain live ranges which is due to > > single instruction and multiple instructions. >> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis > > from the Map > > > when a spill is required for RA. >> > 5 ) The remat transformation APIs like rematerializeAt() will be > > teached to remat > > > live ranges with multiple instructions too. >> > 6 ) A cost analysis will be require to decide between remat and > > spill. This should be based on at least two factors register > > pressure and spill cost >> > Few points: > > > -------------- > > > * The analysis pass to be addes as per (2) will use target specific > > information > > > from TargetInstrInfo.cpp as the current remat infrastructure uses. >> > * This approach will not be on demand as the current approach is > > (i.e > > remat specific > > > code will be executed only if there is a spill) so the pass in (2) > > can be an > > > overhead so we may want it to enable only for higher level of > > optimization. >> > * Will it be possible to use existing SCCP.cpp code with few > > modification to lattice > > > and related mathematical operation so that it can serve both > > purpose? >> > * No changes in current register allocators or spill framework will > > be required > > > because remat entry point will be LiveRangeEdit. >> > Any other way with less overhead is always welcomed. > > > Please help us developing a plan to implement this. >> > Hoping for comments! >> > Sincerely, > > > Vivek >> > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160924/9fcb559e/attachment.html>