Matthias Braun via llvm-dev
2015-Sep-29 23:15 UTC
[llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()
A similar setting occurs with ARM Thumb code which for many instructions has a short 2-address encoding and a longer 3 address form. As far as I know this is done by selecting the 3 address form and rewriting them to 2-address after register allocation where possible. See lib/Target/ARM/Thumb2SizeReduction.cpp. - Matthias> On Sep 29, 2015, at 2:22 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Jonas, > >> On Sep 29, 2015, at 2:00 AM, Jonas Paulsson via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking >> for the case of a physreg copy, and so leaves the majority of cases as they are (2-address). >> >> I would like to say "If 3-addr version would avoid a copy, use it!". Does anyone else have a similar situation? > > I think this is what it is supposed to do right now :). Though I reckon the test is probably over conservative in the sense that it returns true only if it can prove this is going to save a copy. > >> >> To do this, one would need to check the kill-flag on the tied use operand. If it is not killed, one can assume that the use and dst registers overlap, and therefore the copy is needed for the two-address form. The kill flags would however need to be recomputed by TwoAddr pass, since >> LiveVariables clear them. >> >> An alternative approach might be to have something like TII->handleMachineFunctionPostCoalescer() at the end of RegisterCoalescer.cpp::runOnMachineFunction(). There, one could look for instructions and query live intervals for overlap. This hook might also be useful for other things, since this is the point just before mi-sched/regalloc, where one could do things like estimate register pressure. >> >> Any comments on this anyone? > > We could try to fix the check in two-address pass first. I believe a hook like you describe might be useful but this is yet another thing to teach the coalescer, which is already complex enough IMO. Moreover, I like the separation of concerns that 2- and 3-addr conversions are made within a dedicated pass. > That being said, if getting the best code involves teaching the coalescer about this transformation, sure! > > Cheers, > Q. > >> >> /Jonas Paulsson >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Jonas Paulsson via llvm-dev
2015-Sep-30 09:15 UTC
[llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()
On 2015-09-30 01:15, Matthias Braun wrote:> A similar setting occurs with ARM Thumb code which for many instructions has a short 2-address encoding and a longer 3 address form. As far as I know this is done by selecting the 3 address form and rewriting them to 2-address after register allocation where possible. See lib/Target/ARM/Thumb2SizeReduction.cpp. > > - MatthiasThe late oportunistic conversion is simple, but can only work in cases where regalloc happens to put the new definition in the same register as one of the source operands. In case regalloc would try to do this as much as possible, this might work most of the time, however I have no idea if this is the case. Some targets may want round-robin allocation, while others would prefer reuse of registers. Steve says he gets most of it handled, how about ARM Thumb? Is this with RAGreedy?>> On Sep 29, 2015, at 2:22 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi Jonas, >> >>> On Sep 29, 2015, at 2:00 AM, Jonas Paulsson via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> Hi, >>> >>> I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking >>> for the case of a physreg copy, and so leaves the majority of cases as they are (2-address). >>> >>> I would like to say "If 3-addr version would avoid a copy, use it!". Does anyone else have a similar situation? >> I think this is what it is supposed to do right now :). Though I reckon the test is probably over conservative in the sense that it returns true only if it can prove this is going to save a copy.Yes, it is very much overconservative because it only checks the cases of phys-reg copies around calls and returns, meaning that all other cases are never transformed. I believe it should ask target to convert to 3-address in all cases no source register is killed. PPCVSXFMAMutate is indeed doing something towards the same goal. I wonder if this pass could be removed/simplified if TwoAddress would be aware of kill flags and eliminate more copys?>> >>> To do this, one would need to check the kill-flag on the tied use operand. If it is not killed, one can assume that the use and dst registers overlap, and therefore the copy is needed for the two-address form. The kill flags would however need to be recomputed by TwoAddr pass, since >>> LiveVariables clear them. >>> >>> An alternative approach might be to have something like TII->handleMachineFunctionPostCoalescer() at the end of RegisterCoalescer.cpp::runOnMachineFunction(). There, one could look for instructions and query live intervals for overlap. This hook might also be useful for other things, since this is the point just before mi-sched/regalloc, where one could do things like estimate register pressure. >>> >>> Any comments on this anyone? >> We could try to fix the check in two-address pass first. I believe a hook like you describe might be useful but this is yet another thing to teach the coalescer, which is already complex enough IMO. Moreover, I like the separation of concerns that 2- and 3-addr conversions are made within a dedicated pass. >> That being said, if getting the best code involves teaching the coalescer about this transformation, sure! >> >> Cheers, >> Q. >> >>> /Jonas Paulsson >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev/ Jonas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150930/001a37d3/attachment.html>
Hal Finkel via llvm-dev
2015-Oct-05 17:16 UTC
[llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()
----- Original Message -----> From: "Jonas Paulsson" <paulsson at linux.vnet.ibm.com> > To: "Matthias Braun" <mbraun at apple.com>, "Quentin Colombet" <qcolombet at apple.com>, "Steve King" > <steve at metrokings.com>, hfinkel at anl.gov > Cc: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, September 30, 2015 4:15:25 AM > Subject: Re: [llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr() > > On 2015-09-30 01:15, Matthias Braun wrote: > > > A similar setting occurs with ARM Thumb code which for many > instructions has a short 2-address encoding and a longer 3 address > form. As far as I know this is done by selecting the 3 address form > and rewriting them to 2-address after register allocation where > possible. See lib/Target/ARM/Thumb2SizeReduction.cpp. > > - Matthias > > The late oportunistic conversion is simple, but can only work in > cases where regalloc happens to put the new definition in the same > register as one of the source operands. In case regalloc would try > to do this as much as possible, this might work most of the time, > however I have no idea if this is the case. Some targets may want > round-robin allocation, while others would prefer reuse of > registers. Steve says he gets most of it handled, how about ARM > Thumb? Is this with RAGreedy? > > > > > On Sep 29, 2015, at 2:22 PM, Quentin Colombet via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi Jonas, > > On Sep 29, 2015, at 2:00 AM, Jonas Paulsson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi, > > I have cases of instruction pairs, where one is cheaper 2-address, > and the other 3-address. I would like to select the 2-addr > instruction during isel, but use the 3-addr instruction to avoid a > copy if possible. I find that > TwoAddressInstructionPass::isProfitableToConv3Addr() is only > checking > for the case of a physreg copy, and so leaves the majority of cases > as they are (2-address). > > I would like to say "If 3-addr version would avoid a copy, use it!". > Does anyone else have a similar situation? I think this is what it > is supposed to do right now :). Though I reckon the test is probably > over conservative in the sense that it returns true only if it can > prove this is going to save a copy. > > Yes, it is very much overconservative because it only checks the > cases of phys-reg copies around calls and returns, meaning that all > other cases are never transformed. I believe it should ask target to > convert to 3-address in all cases no source register is killed. > > PPCVSXFMAMutate is indeed doing something towards the same goal. I > wonder if this pass could be removed/simplified if TwoAddress would > be aware of kill flags and eliminate more copys?PPCVSXFMAMutate runs in between MI scheduling and register allocation. In this way, it only eliminates copies that actually remain after scheduling. TwoAddress runs prior to MI scheduling. If I have: c1 = c x = a*b + c <tied> y = d*e + c1 <tied> I might mutate the first instruction so that I have: x = a*(b <tied>) + c y = d*e + c <tied> to eliminate the copy. However, in order to hide latency, the schedule might prefer to flip the two FMA instructions, which it now cannot do if I've mutated first without re-introducing the copy I was trying to avoid. -Hal> > > To do this, one would need to check the kill-flag on the tied use > operand. If it is not killed, one can assume that the use and dst > registers overlap, and therefore the copy is needed for the > two-address form. The kill flags would however need to be recomputed > by TwoAddr pass, since > LiveVariables clear them. > > An alternative approach might be to have something like > TII->handleMachineFunctionPostCoalescer() at the end of > RegisterCoalescer.cpp::runOnMachineFunction(). There, one could look > for instructions and query live intervals for overlap. This hook > might also be useful for other things, since this is the point just > before mi-sched/regalloc, where one could do things like estimate > register pressure. > > Any comments on this anyone? We could try to fix the check in > two-address pass first. I believe a hook like you describe might be > useful but this is yet another thing to teach the coalescer, which > is already complex enough IMO. Moreover, I like the separation of > concerns that 2- and 3-addr conversions are made within a dedicated > pass. > That being said, if getting the best code involves teaching the > coalescer about this transformation, sure! > > Cheers, > Q. > > > > > > /Jonas Paulsson > > _______________________________________________ > LLVM Developers mailing list llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > / Jonas > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory