After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of branches-to-make / backedges-taken then this is slightly awkward as we need to subtract the constant 1. Of course if the iteration count was constant this is trivial but if it is passed in register then it is not so nice to have to insert these subtract instructions from a MIR pass (where the hwloop finalization is being done). I wonder what would be the best way to deal with this. One way would be to add a TTI hook gating the original addition but then the intrinsic will have two meanings depending on what this hook returns which is not good. Another way would be to introduce a second intrinsic, say llvm.set.loop.backedges that corresponds to that value, but then we have yet another intrinsic. A third option could be to have the original intrinsic take both values as arguments. Any thoughts on this? regards Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190711/6b1c33c3/attachment.html>
Hi Markus, It's fair to expect there's going to be slight differences between each target, so HardwareLoopInfo is there to communicate these between the backend and the transform. I would suggest adding a flag in that struct to prevent the HardwareLoops pass from adding one to the 'ExitCount' as this is cleaner than adding another intrinsic. Regards, sam Sam Parker Compilation Tools Engineer | Arm . . . . . . . . . . . . . . . . . . . . . . . . . . . Arm.com ________________________________ From: Markus Lavin <markus.lavin at ericsson.com> Sent: 11 July 2019 14:40 To: llvm-dev at lists.llvm.org Cc: Sam Parker Subject: llvm.set.loop.iterations After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of branches-to-make / backedges-taken then this is slightly awkward as we need to subtract the constant 1. Of course if the iteration count was constant this is trivial but if it is passed in register then it is not so nice to have to insert these subtract instructions from a MIR pass (where the hwloop finalization is being done). I wonder what would be the best way to deal with this. One way would be to add a TTI hook gating the original addition but then the intrinsic will have two meanings depending on what this hook returns which is not good. Another way would be to introduce a second intrinsic, say llvm.set.loop.backedges that corresponds to that value, but then we have yet another intrinsic. A third option could be to have the original intrinsic take both values as arguments. Any thoughts on this? regards Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190711/2f408087/attachment.html>
Hi Sam, Right, the HardwareLoopInfo struct is the most obvious place and I am fine with doing that. The downside would possibly be that we will then have a non target specific intrinsic whose value may be interpreted in two different ways and for other passes there is no way to find out which. Maybe not very relevant right now but in the far future some other pass might want to do something with this intrinsic (I mean it is documented after all). regards Markus ________________________________ From: Sam Parker <Sam.Parker at arm.com> Sent: Thursday, July 11, 2019 3:54 PM To: Markus Lavin; llvm-dev at lists.llvm.org Cc: nd Subject: Re: llvm.set.loop.iterations Hi Markus, It's fair to expect there's going to be slight differences between each target, so HardwareLoopInfo is there to communicate these between the backend and the transform. I would suggest adding a flag in that struct to prevent the HardwareLoops pass from adding one to the 'ExitCount' as this is cleaner than adding another intrinsic. Regards, sam Sam Parker Compilation Tools Engineer | Arm . . . . . . . . . . . . . . . . . . . . . . . . . . . Arm.com ________________________________ From: Markus Lavin <markus.lavin at ericsson.com> Sent: 11 July 2019 14:40 To: llvm-dev at lists.llvm.org Cc: Sam Parker Subject: llvm.set.loop.iterations After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of branches-to-make / backedges-taken then this is slightly awkward as we need to subtract the constant 1. Of course if the iteration count was constant this is trivial but if it is passed in register then it is not so nice to have to insert these subtract instructions from a MIR pass (where the hwloop finalization is being done). I wonder what would be the best way to deal with this. One way would be to add a TTI hook gating the original addition but then the intrinsic will have two meanings depending on what this hook returns which is not good. Another way would be to introduce a second intrinsic, say llvm.set.loop.backedges that corresponds to that value, but then we have yet another intrinsic. A third option could be to have the original intrinsic take both values as arguments. Any thoughts on this? regards Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190711/8d69ee82/attachment.html>
Hi Markus. Can't you just custom lower the intrinsic at ISel, to some target specific pseudo/instruction. And then you introduce an ISD::SUB at some early point during ISel (hopefully making it possible for DAGCombine to fold away the ISD::ADD/ISD::SUB pair). Or are you afraid that the addition has been hoisted to a different basic block compared to the intrinsic (since DAGCombiner is local to a BB that would reduce the possibilities for folding away the ISD::SUB)? Or what says that you need to wait until hwloop finalization (assuming that there is no target-generic MIR counterpart for llvm.set.loop.iterations, or is it)? /Björn ________________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Markus Lavin via llvm-dev <llvm-dev at lists.llvm.org> Sent: Thursday, July 11, 2019 15:40 To: llvm-dev at lists.llvm.org Subject: [llvm-dev] llvm.set.loop.iterations After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of branches-to-make / backedges-taken then this is slightly awkward as we need to subtract the constant 1. Of course if the iteration count was constant this is trivial but if it is passed in register then it is not so nice to have to insert these subtract instructions from a MIR pass (where the hwloop finalization is being done). I wonder what would be the best way to deal with this. One way would be to add a TTI hook gating the original addition but then the intrinsic will have two meanings depending on what this hook returns which is not good. Another way would be to introduce a second intrinsic, say llvm.set.loop.backedges that corresponds to that value, but then we have yet another intrinsic. A third option could be to have the original intrinsic take both values as arguments. Any thoughts on this? regards Markus
Hi Bjorn, That sounds like another viable option but perhaps one involving slightly more work. Intuitively though it seems that solving the problem at the source (i.e. not adding the constant in the first place) would be the cleanest solution besides of course the issues with that already mentioned. I would imagine that most machine instructions realizing hw loops are more interested in the number of jumps to make as compared to the the number of iterations a loop will roll. But in this case clearly my imagination does not line up with reality as other targets seem to want the number of iterations. I haven't thought about the case of the addition being hoisted out of the loop pre-header but I guess that could happen. There is nothing AFAIK that requires us to keep the llvm.set.loop.iterations intact but it just seemed like the easiest thing to do to avoid introducing more intermediate pseudos. Need to experiment some more. -Markus ________________________________ From: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> Sent: Friday, July 12, 2019 12:23 AM To: Markus Lavin <markus.lavin at ericsson.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] llvm.set.loop.iterations Hi Markus. Can't you just custom lower the intrinsic at ISel, to some target specific pseudo/instruction. And then you introduce an ISD::SUB at some early point during ISel (hopefully making it possible for DAGCombine to fold away the ISD::ADD/ISD::SUB pair). Or are you afraid that the addition has been hoisted to a different basic block compared to the intrinsic (since DAGCombiner is local to a BB that would reduce the possibilities for folding away the ISD::SUB)? Or what says that you need to wait until hwloop finalization (assuming that there is no target-generic MIR counterpart for llvm.set.loop.iterations, or is it)? /Björn ________________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Markus Lavin via llvm-dev <llvm-dev at lists.llvm.org> Sent: Thursday, July 11, 2019 15:40 To: llvm-dev at lists.llvm.org Subject: [llvm-dev] llvm.set.loop.iterations After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of branches-to-make / backedges-taken then this is slightly awkward as we need to subtract the constant 1. Of course if the iteration count was constant this is trivial but if it is passed in register then it is not so nice to have to insert these subtract instructions from a MIR pass (where the hwloop finalization is being done). I wonder what would be the best way to deal with this. One way would be to add a TTI hook gating the original addition but then the intrinsic will have two meanings depending on what this hook returns which is not good. Another way would be to introduce a second intrinsic, say llvm.set.loop.backedges that corresponds to that value, but then we have yet another intrinsic. A third option could be to have the original intrinsic take both values as arguments. Any thoughts on this? regards Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190712/5247b066/attachment.html>