Daniel Berlin
2015-Jan-12 06:35 UTC
[LLVMdev] Separating loop nests based on profile information?
> > > Looking at what LLVM does, the failing on the PRE side is that our PRE/GVN > models are not strong enough to handle this. I'm not sure at all why we > think anything else is necessary. >By this i mean we don't actually perform real PRE for loads. It's a known deficiency and one that does not require any duplicate heuristics to solve. Right now we perform some availability based removal that happens be willing to do an insertion here or there to try to move a load in some cases (it will only do a single insertion). It's not also based on real availability, but on finding nearest dependencies of loads and squinting at them (which is not even really the same as finding where a load stops being available) funny. It's also not real PRE, just some code that is willing to insert one load to move a load. If you want this case to work, what you need is a real Load PRE implementation, not one based on simple memdep based load moving. Trying to hack this one up with profile guided code duplication, etc, to get it to kinda sorta work seems to me to be a bad idea.> It's certainly not requiring special code duplication heuristics, etc. > > So either you are thinking of a case that is different from the above, or > I am seriously confused :) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/90307f6f/attachment.html>
Philip Reames
2015-Jan-12 18:08 UTC
[LLVMdev] Separating loop nests based on profile information?
On 01/11/2015 10:35 PM, Daniel Berlin wrote:> > > Looking at what LLVM does, the failing on the PRE side is that our > PRE/GVN models are not strong enough to handle this. I'm not sure > at all why we think anything else is necessary. > > > By this i mean we don't actually perform real PRE for loads. It's a > known deficiency and one that does not require any duplicate > heuristics to solve. > Right now we perform some availability based removal that happens be > willing to do an insertion here or there to try to move a load in some > cases (it will only do a single insertion). It's not also based on > real availability, but on finding nearest dependencies of loads and > squinting at them (which is not even really the same as finding where > a load stops being available) funny. > > It's also not real PRE, just some code that is willing to insert one > load to move a load.Er, I think we have a terminology problem here. From what I can gather, our current implementation *is* an implementation of classic PRE. It's more or less directly organized around the classic "available" and "anticipated" concepts from the literature. The *implementation* has some limitations, but the conceptual framework is there*. One of the key invariants in classic PRE is that you do not change program behaviour on any path. As a result, moving (or at most duplicating) loads is pretty much all that happens. * It took me a few hours of squinting at the code to recognize that weird backwards walk as being an implementation of "anctipated", but it is. The "available" part comes from MemoryDependenceAnalysis and the forward dataflow algorithm based on those inputs. If you want to start inserting loads on paths which didn't previously have them - for example the loop preheader in your example in the previous email - this is not classic PRE as I'm familiar with it. It can be very profitable and worthwhile, but is not classic PRE. I'm not sure what this variant is known as in the literature. (If I'm being overly pedantic here, or just flat out wrong, please say so. I'm still exploring related material and am trying to categorize things in my own head.) I think we actually in reasonable agreement that we do need to move beyond 'classic PRE' as I called it above. If you have specific suggestions, positive or negative examples, or other thoughts to share, I'd very much value the input.> > If you want this case to work, what you need is a real Load PRE > implementation, not one based on simple memdep based load moving. > > Trying to hack this one up with profile guided code duplication, etc, > to get it to kinda sorta work seems to me to be a bad idea.Not sure I really agree here, but I'm open to pushing as far as we can with static heuristics first. I suspect that's likely to be more stable w.r.t. optimization effect and applicable to more cases.> > It's certainly not requiring special code duplication heuristics, etc. > > So either you are thinking of a case that is different from the > above, or I am seriously confused :) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/4240fd75/attachment.html>
Mehdi Amini
2015-Jan-12 18:35 UTC
[LLVMdev] Separating loop nests based on profile information?
> On Jan 12, 2015, at 10:08 AM, Philip Reames <listmail at philipreames.com> wrote: > > > On 01/11/2015 10:35 PM, Daniel Berlin wrote: >> >> Looking at what LLVM does, the failing on the PRE side is that our PRE/GVN models are not strong enough to handle this. I'm not sure at all why we think anything else is necessary. >> >> By this i mean we don't actually perform real PRE for loads. It's a known deficiency and one that does not require any duplicate heuristics to solve. >> Right now we perform some availability based removal that happens be willing to do an insertion here or there to try to move a load in some cases (it will only do a single insertion). It's not also based on real availability, but on finding nearest dependencies of loads and squinting at them (which is not even really the same as finding where a load stops being available) funny. >> >> It's also not real PRE, just some code that is willing to insert one load to move a load. > Er, I think we have a terminology problem here. From what I can gather, our current implementation *is* an implementation of classic PRE. It's more or less directly organized around the classic "available" and "anticipated" concepts from the literature. The *implementation* has some limitations, but the conceptual framework is there*. One of the key invariants in classic PRE is that you do not change program behaviour on any path. As a result, moving (or at most duplicating) loads is pretty much all that happens. > > * It took me a few hours of squinting at the code to recognize that weird backwards walk as being an implementation of "anctipated", but it is.Did you add some comments? It may save a few hours of squinting to someone else in the future :) Mehdi> The "available" part comes from MemoryDependenceAnalysis and the forward dataflow algorithm based on those inputs. > > If you want to start inserting loads on paths which didn't previously have them - for example the loop preheader in your example in the previous email - this is not classic PRE as I'm familiar with it. It can be very profitable and worthwhile, but is not classic PRE. I'm not sure what this variant is known as in the literature. > > (If I'm being overly pedantic here, or just flat out wrong, please say so. I'm still exploring related material and am trying to categorize things in my own head.) > > I think we actually in reasonable agreement that we do need to move beyond 'classic PRE' as I called it above. If you have specific suggestions, positive or negative examples, or other thoughts to share, I'd very much value the input. >> >> If you want this case to work, what you need is a real Load PRE implementation, not one based on simple memdep based load moving. >> >> Trying to hack this one up with profile guided code duplication, etc, to get it to kinda sorta work seems to me to be a bad idea. > Not sure I really agree here, but I'm open to pushing as far as we can with static heuristics first. I suspect that's likely to be more stable w.r.t. optimization effect and applicable to more cases. >> >> It's certainly not requiring special code duplication heuristics, etc. >> >> So either you are thinking of a case that is different from the above, or I am seriously confused :) >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/97022d24/attachment.html>
Daniel Berlin
2015-Jan-12 21:11 UTC
[LLVMdev] Separating loop nests based on profile information?
On Mon, Jan 12, 2015 at 1:08 PM, Philip Reames <listmail at philipreames.com> wrote:> > On 01/11/2015 10:35 PM, Daniel Berlin wrote: > > >> Looking at what LLVM does, the failing on the PRE side is that our >> PRE/GVN models are not strong enough to handle this. I'm not sure at all >> why we think anything else is necessary. >> > > By this i mean we don't actually perform real PRE for loads. It's a > known deficiency and one that does not require any duplicate heuristics to > solve. > Right now we perform some availability based removal that happens be > willing to do an insertion here or there to try to move a load in some > cases (it will only do a single insertion). It's not also based on real > availability, but on finding nearest dependencies of loads and squinting at > them (which is not even really the same as finding where a load stops being > available) funny. > > It's also not real PRE, just some code that is willing to insert one > load to move a load. > > Er, I think we have a terminology problem here. From what I can gather, > our current implementation *is* an implementation of classic PRE. >Scalar yes, load PRE, no. What load PRE does is not the same as availability. It finds the nearest dependencies (be they load/load, store/load, or clobber) for a load walking backwards. That is not availability. Availability is a forward problem, not a backwards one :) It tries to transform the data it gets from memdep into something like availability info, but you *will not get the same results* in some cases. It's more or less directly organized around the classic "available" and> "anticipated" concepts from the literature. The *implementation* has some > limitations, but the conceptual framework is there*. >I agree the anticipation calculation is somewhat there, but even that is not a standard PRE one.> One of the key invariants in classic PRE is that you do not change program > behaviour on any path. >Well, it's more "do not increase the number of computations on any path", but even that gets violated in most standard PRE's because they do LICM out of loops.> As a result, moving (or at most duplicating) loads is pretty much all that > happens. >Sure.> > * It took me a few hours of squinting at the code to recognize that weird > backwards walk as being an implementation of "anctipated", but it is. The > "available" part comes from MemoryDependenceAnalysis and the forward > dataflow algorithm based on those inputs. >memorydependencyanalysis is a backwards walker, it does not compute availability of a load. It computes the nearest dependencies. We then try to take this and turn it into the result of a forward availability problem.> > If you want to start inserting loads on paths which didn't previously have > them - for example the loop preheader in your example in the previous email > - this is not classic PRE as I'm familiar with it. It can be very > profitable and worthwhile, but is not classic PRE. I'm not sure what this > variant is known as in the literature. >This is definitely a standard SSA based PRE algorithm. Every single one i am familiar with will all move code out of loops like this, be it scalar or loads, regardless of the fact that it does in fact, increase the number of possible computations by one if the loop does not execute.> > (If I'm being overly pedantic here, or just flat out wrong, please say > so. I'm still exploring related material and am trying to categorize > things in my own head.) > >I do not believe you will find an SSA based PRE that will not do the above optimization by default. I think we actually in reasonable agreement that we do need to move beyond> 'classic PRE' as I called it above. If you have specific suggestions, > positive or negative examples, or other thoughts to share, I'd very much > value the input. >> If you want this case to work, what you need is a real Load PRE > implementation, not one based on simple memdep based load moving. > > > Trying to hack this one up with profile guided code duplication, etc, to > get it to kinda sorta work seems to me to be a bad idea. > > Not sure I really agree here, but I'm open to pushing as far as we can > with static heuristics first. I suspect that's likely to be more stable > w.r.t. optimization effect and applicable to more cases. >So, LLVM has been through three PRE implementations. It had an implementation of e-path PRE, an implementation of GVNPRE (that only worked on scalar code i believe) a long long time ago, and then the current one. The current one is used because it's fast and catches a lot of cases. The others were eliminated because they were essentially experimental research-quality implementations, and were slow :) Because the current PRE is not value based, and it really has no idea about memory, you will run into limitations on loads pretty quickly (particularly loads indexed by things like induction variables). It already tries to work around some of them by doing some actual value analysis of the load (and doing forwarding), but this only works on basic cases. GVNPRE can be implemented and made a lot faster than it was (In GCC, it's actually one of the fastest high level optimization passes) with very careful engineering, and would take care of all of these cases. I'm also fairly positive you could take any good anticipation based PRE algorithm you can find, and i could make it value based and still fast. But in the end, pushing the current one seems like a bad idea to me because it wasn't designed for what you are trying to do - get good PRE results, and do good load PRE, and we know of better ways to do that. All pushing it will do is push it out of the sweet spot it was designed for. If you want to get into profille-guided placement or speculative PRE, it involves solving min-cut problems on very large graphs in most cases. The only implementation i'm aware of that is relatively sane is http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf and https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1 (This does not require solving min-cut problems) In the end though, you are doing the work, so you get to decide what you want to do :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/bc14e8d8/attachment.html>
Xinliang David Li
2015-Jan-13 05:44 UTC
[LLVMdev] Separating loop nests based on profile information?
On Mon, Jan 12, 2015 at 10:08 AM, Philip Reames <listmail at philipreames.com> wrote:> > On 01/11/2015 10:35 PM, Daniel Berlin wrote: > > >> Looking at what LLVM does, the failing on the PRE side is that our >> PRE/GVN models are not strong enough to handle this. I'm not sure at all >> why we think anything else is necessary. >> > > By this i mean we don't actually perform real PRE for loads. It's a > known deficiency and one that does not require any duplicate heuristics to > solve. > Right now we perform some availability based removal that happens be > willing to do an insertion here or there to try to move a load in some > cases (it will only do a single insertion). It's not also based on real > availability, but on finding nearest dependencies of loads and squinting at > them (which is not even really the same as finding where a load stops being > available) funny. > > It's also not real PRE, just some code that is willing to insert one > load to move a load. > > Er, I think we have a terminology problem here. From what I can gather, > our current implementation *is* an implementation of classic PRE. It's > more or less directly organized around the classic "available" and > "anticipated" concepts from the literature. The *implementation* has some > limitations, but the conceptual framework is there*. One of the key > invariants in classic PRE is that you do not change program behaviour on > any path. As a result, moving (or at most duplicating) loads is pretty > much all that happens. > > * It took me a few hours of squinting at the code to recognize that weird > backwards walk as being an implementation of "anctipated", but it is. The > "available" part comes from MemoryDependenceAnalysis and the forward > dataflow algorithm based on those inputs. > > If you want to start inserting loads on paths which didn't previously have > them - for example the loop preheader in your example in the previous email > - this is not classic PRE as I'm familiar with it. It can be very > profitable and worthwhile, but is not classic PRE. I'm not sure what this > variant is known as in the literature. > > (If I'm being overly pedantic here, or just flat out wrong, please say > so. I'm still exploring related material and am trying to categorize > things in my own head.) > > I think we actually in reasonable agreement that we do need to move beyond > 'classic PRE' as I called it above. If you have specific suggestions, > positive or negative examples, or other thoughts to share, I'd very much > value the input. > > > If you want this case to work, what you need is a real Load PRE > implementation, not one based on simple memdep based load moving. > > > Trying to hack this one up with profile guided code duplication, etc, to > get it to kinda sorta work seems to me to be a bad idea. > > Not sure I really agree here, but I'm open to pushing as far as we can > with static heuristics first. I suspect that's likely to be more stable > w.r.t. optimization effect and applicable to more cases. >I agree with Danny. Profile data is useful for code placement, and profitability computation in speculative PRE. I don't see how your proposed code movement can be an enabler for more PRE. David> > >> It's certainly not requiring special code duplication heuristics, etc. >> >> So either you are thinking of a case that is different from the above, >> or I am seriously confused :) >> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/1a134ff3/attachment.html>