Xinliang David Li
2015-Jan-13 03:39 UTC
[LLVMdev] Separating loop nests based on profile information?
On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at philipreames.com> > wrote: > >> I've been playing with approaches to getting better optimization of loops >> which contain infrequently executed slow paths. I've gotten as far as >> throwing together a proof of concept implementation of a profile guided >> optimization to separate a single loop with multiple latches into a loop >> nest, but I want to get feedback from interested parties before investing >> much more effort. >> >> The primary case I'm concerned about looks something like this C example: >> while( some_condition ) >> //do lots of work here >> if (!some_rare_condition) { <-- This is loop variant >> helper(); >> } >> } >> >> The key point here is that the 'do lots of work' section contains things >> we could lift out of the loop, but our knowledge of memory gets completely >> destoryed by the 'helper' call which doesn't get inlined. This ends up >> crippling LICM in particular, and GVN to a lessor degree. >> >> The approach I've been playing with would create this loop structure: >> goto inner >> while (true) { >> outer: >> helper(); >> inner: >> while( some_condition ) >> //do lots of work here >> if (!some_rare_condition) { <-- This is loop variant >> goto outer; >> } >> } >> } >> >> (Excuse the psuedo-C. Hopefully the idea is clear.) >> > > Yep. > > I've not thought about this a lot, but I have two initial questions that > maybe you can answer: > > How does this compare with classical approaches of loop peeling, > partitioning, fission, or whatever you might call it? Is there any > literature behind this approach or some literature it should be compared > with? (I genuinely don't know this area that well, so I'm of little help > here...) >It is not any of those loop transformations. It does not even change the control flow. It is more a code layout optimization -- move the cold trace in the loop out of the body.> > > Some of your points I have quick feedback on: > >> Points for discussion: >> >> - Is using profile information for this purpose even a reasonable >> thing to do? >> >> Yes! > > >> >> - I chose to implement this without relying on the existing block >> frequency analysis. My reasoning was that a) this is a rarely taken case >> and adding an expensive analysis dependency probably wasn't worthwhile and >> b) that block frequency analysis was more expensive/precise than I really >> needed. Is this reasonable? >> >> I think we should always use the analyses. Either BlockFrequency or > BranchProbability. I think probably both in the common joint usage > (frequency of the loop header combined with probability of the cold region). >yes.> >> - If so, is the notion of 'rareness' of a loop block something that's >> worth extracting out on it's own and reusing? Are there other similar uses >> anyone can think of? >> - Currently, I'm only supporting a fairly small set of controlling >> conditions. Are there important cases I'm not considering? >> >> To both of these, I think the general combination to use is to identify > the set of blocks dominated by a block which is in the loop body of a hot > loop, and is cold relative to the other successors of its predecessor(s). > These form cold "regions" as I think of them without requiring the > complexity of the region analysis. > >Static prediction should handle it -- call heuristics or some combination with other heuristics (error handler recognition etc). David> >> - Since the rarest latch is often deep in a loop - with other "if (X) >> continue;" (i.e. latches) before it - this tends to create loops with >> multiple exiting blocks. Some of the existing passes might not deal with >> this well, is that a major concern? Suggestions for how to analysis and >> validate? >> >> I'm somewhat concerned about this, but want to think more about the > fundamental transformation. > > >> >> - Currently, I've structured this as pulling off the rarest latch as >> an outer iteration. I could also pull off the most popular latch as an >> inner iteration. This might give different tradeoffs; thoughts? >> >> Generally, any thoughts anyone have on the problem or approach are >> welcome. I'm not particular attached to the particular approach laid out >> here and if there's a more advantageous approach, all the better. >> > Thanks for pushing on this! ;] Now I need to go and ponder a lot so i can > reply more deeply on the actual transform. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/69bd8bbb/attachment.html>
Philip Reames
2015-Jan-20 01:43 UTC
[LLVMdev] Separating loop nests based on profile information?
On 01/12/2015 07:39 PM, Xinliang David Li wrote:> > > On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com > <mailto:chandlerc at google.com>> wrote: > > > > How does this compare with classical approaches of loop peeling, > partitioning, fission, or whatever you might call it? Is there any > literature behind this approach or some literature it should be > compared with? (I genuinely don't know this area that well, so I'm > of little help here...) > > > > It is not any of those loop transformations. It does not even change > the control flow. It is more a code layout optimization -- move the > cold trace in the loop out of the body.It's also a somewhat major change in the canonical form of the loop as seen by the optimizer. That's probably the more important part in practice.> > > * If so, is the notion of 'rareness' of a loop block > something that's worth extracting out on it's own and > reusing? Are there other similar uses anyone can think of? > * Currently, I'm only supporting a fairly small set of > controlling conditions. Are there important cases I'm not > considering? > > To both of these, I think the general combination to use is to > identify the set of blocks dominated by a block which is in the > loop body of a hot loop, and is cold relative to the other > successors of its predecessor(s). These form cold "regions" as I > think of them without requiring the complexity of the region analysis. > > > Static prediction should handle it -- call heuristics or some > combination with other heuristics (error handler recognition etc). >Sorry, I'm really not sure what you're trying to say here. Could you clarify/expand? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/fe849543/attachment.html>
Xinliang David Li
2015-Jan-20 04:48 UTC
[LLVMdev] Separating loop nests based on profile information?
On Mon, Jan 19, 2015 at 5:43 PM, Philip Reames <listmail at philipreames.com> wrote:> > On 01/12/2015 07:39 PM, Xinliang David Li wrote: > > > > On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com> > wrote: > >> >> >> How does this compare with classical approaches of loop peeling, >> partitioning, fission, or whatever you might call it? Is there any >> literature behind this approach or some literature it should be compared >> with? (I genuinely don't know this area that well, so I'm of little help >> here...) >> > > > It is not any of those loop transformations. It does not even change the > control flow. It is more a code layout optimization -- move the cold trace > in the loop out of the body. > > It's also a somewhat major change in the canonical form of the loop as > seen by the optimizer. That's probably the more important part in > practice. >right.> > > >>> - If so, is the notion of 'rareness' of a loop block something >>> that's worth extracting out on it's own and reusing? Are there other >>> similar uses anyone can think of? >>> - Currently, I'm only supporting a fairly small set of controlling >>> conditions. Are there important cases I'm not considering? >>> >>> To both of these, I think the general combination to use is to >> identify the set of blocks dominated by a block which is in the loop body >> of a hot loop, and is cold relative to the other successors of its >> predecessor(s). These form cold "regions" as I think of them without >> requiring the complexity of the region analysis. >> >> > > Static prediction should handle it -- call heuristics or some > combination with other heuristics (error handler recognition etc). > > Sorry, I'm really not sure what you're trying to say here. Could you > clarify/expand? >It is more of a comment about identify rare blocks -- there are cases where static heuristics can produce branch prediction with high confidence. David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/ad77821b/attachment.html>