thr3ads.net - llvm dev - [LLVMdev] Separating loop nests based on profile information? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2015-Jan-08 19:37 UTC

[LLVMdev] Separating loop nests based on profile information?

On 01/07/2015 05:33 PM, Chandler Carruth wrote:>
> On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>     I've been playing with approaches to getting better optimization
>     of loops which contain infrequently executed slow paths.  I've
>     gotten as far as throwing together a proof of concept
>     implementation of a profile guided optimization to separate a
>     single loop with multiple latches into a loop nest, but I want to
>     get feedback from interested parties before investing much more
>     effort.
>
>     The primary case I'm concerned about looks something like this C
>     example:
>     while( some_condition )
>       //do lots of work here
>       if (!some_rare_condition) { <-- This is loop variant
>         helper();
>       }
>     }
>
>     The key point here is that the 'do lots of work' section
contains
>     things we could lift out of the loop, but our knowledge of memory
>     gets completely destoryed by the 'helper' call which
doesn't get
>     inlined.  This ends up crippling LICM in particular, and GVN to a
>     lessor degree.
>
>     The approach I've been playing with would create this loop
structure:
>     goto inner
>     while (true) {
>       outer:
>       helper();
>       inner:
>       while( some_condition )
>         //do lots of work here
>         if (!some_rare_condition) { <-- This is loop variant
>           goto outer;
>         }
>       }
>     }
>
>     (Excuse the psuedo-C.  Hopefully the idea is clear.)
>
>
> Yep.
>
> I've not thought about this a lot, but I have two initial questions 
> that maybe you can answer:
>
> How does this compare with classical approaches of loop peeling, 
> partitioning, fission, or whatever you might call it? Is there any 
> literature behind this approach or some literature it should be 
> compared with? (I genuinely don't know this area that well, so I'm
of
> little help here...)I'm not aware of any literature that covers the specific transform I'm 
suggesting here.  This is probably a lack of awareness on my part, not a 
lack of literature though.  While I have some basic background in the 
area, I can't claim to have made an extensive study of loop optimization 
techniques.  *Particularly* profile guided ones.

w.r.t. the tradoffs in practice, I'm going to answer this in a separate 
response just to keep this one short.  I've been playing with both 
approaches and there appears to be appeal in both.  I'm currently 
leaning towards a mixture of both approaches.
>
> Some of your points I have quick feedback on:
>
>     Points for discussion:
>
>       * Is using profile information for this purpose even a
>         reasonable thing to do?
>
> Yes!
>
>       * I chose to implement this without relying on the existing
>         block frequency analysis. My reasoning was that a) this is a
>         rarely taken case and adding an expensive analysis dependency
>         probably wasn't worthwhile and b) that block frequency
>         analysis was more expensive/precise than I really needed. Is
>         this reasonable?
>
> I think we should always use the analyses. Either BlockFrequency or 
> BranchProbability. I think probably both in the common joint usage 
> (frequency of the loop header combined with probability of the cold 
> region).Let me ask a basic question: what do BlockFrequency and 
BranchProbability compute and roughly what mental cost model should I 
have?  Once I have a BlockFrequency, what does it mean?  Some of this is 
documented in BlockFrequencyImpl.h, but the high level bits are mostly 
missing.  Particularly for BranchProbability.

I chose to write a custom analysis in large part because I didn't 
understand the existing analysis and my (possibly completely wrong) 
perception is that they're overly precise/expensive for what I need.  I 
wanted to focus on the writing the transform since that's the point I 
actually wanted to discuss.  :)
>       * If so, is the notion of 'rareness' of a loop block
something
>         that's worth extracting out on it's own and reusing? Are
there
>         other similar uses anyone can think of?
>       * Currently, I'm only supporting a fairly small set of
>         controlling conditions. Are there important cases I'm not
>         considering?
>
> To both of these, I think the general combination to use is to 
> identify the set of blocks dominated by a block which is in the loop 
> body of a hot loop, and is cold relative to the other successors of 
> its predecessor(s). These form cold "regions" as I think of them 
> without requiring the complexity of the region analysis.I get why you find this a useful mental model, but the transform is far 
harder to express this way in the couple of cases I've looked at.  It's 
much easier to start with a latch, reason about it's hotness, and then 
let the generic loop code construct the cold "region".

Then again, if you were to ask for the set of latches which were 
dominated by blocks in the cold-with-hot-predecessors list, that might 
be interesting.  But again, I'd really like to start from the latches 
since that's what controls the legality/profitability of the transform.  
Hm, we could walk back along the dom tree from the latch looking for 
such a cwhp block... This might be a better way to think about the 
profitability of the transform, even starting from the latch.  Is that 
what you were trying to get at to start with?  Or were you looking for 
something more general?
>       * Since the rarest latch is often deep in a loop - with other
>         "if (X) continue;" (i.e. latches) before it - this tends
to
>         create loops with multiple exiting blocks. Some of the
>         existing passes might not deal with this well, is that a major
>         concern? Suggestions for how to analysis and validate?
>
> I'm somewhat concerned about this, but want to think more about the 
> fundamental transformation.
>
>       * Currently, I've structured this as pulling off the rarest
>         latch as an outer iteration. I could also pull off the most
>         popular latch as an inner iteration. This might give different
>         tradeoffs; thoughts?
>
>     Generally, any thoughts anyone have on the problem or approach are
>     welcome. I'm not particular attached to the particular approach
>     laid out here and if there's a more advantageous approach, all the
>     better.
>
> Thanks for pushing on this! ;] Now I need to go and ponder a lot so i 
> can reply more deeply on the actual transform.You're welcome.  :)

This might be an area where it's profitable to talk in person, are you 
going to be at the social tonight?

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150108/bd67cbe3/attachment.html>

Chandler Carruth

2015-Jan-08 19:47 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

On Thu, Jan 8, 2015 at 11:37 AM, Philip Reames <listmail at
philipreames.com>
wrote:
> This might be an area where it's profitable to talk in person, are you
> going to be at the social tonight?

just to reply to this most pertinent detail, yes. =D
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150108/d63185c1/attachment.html>

Duncan P. N. Exon Smith

2015-Jan-09 21:56 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

> On 2015-Jan-08, at 11:37, Philip Reames <listmail at
philipreames.com> wrote:
> 
> Let me ask a basic question: what do BlockFrequency and BranchProbability
compute and roughly what mental cost model should I have?  Once I have a
BlockFrequency, what does it mean?  Some of this is documented in
BlockFrequencyImpl.h, but the high level bits are mostly missing.  Particularly
for BranchProbability.
*Branch weight*: An integer attached to an outgoing edge from a basic
block that represents the relative likelihood that it will be taken
over other outgoing edges.

*Branch probability*: Assuming a basic block is executed, what's the
probability it will terminate via a given outgoing edge?  This is the
branch probability for that branch.  Equivalent to the branch weight
divided by the sum of the branch weights for the basic block.

If we don't have profile information, branch weights (and thus,
probabilities) are computed using heuristics in -branch-prob.

*Block frequency*: Assume a function is executed many times -- call this
large number the block frequency of the entry block.  Given that, how
many times is each basic block executed?  This is the block frequency of
each block.  Frequencies are calculated from branch probabilities in
-block-freq.  You can compare relative "hotness" of two basic blocks
by
comparing their block frequencies.  (A block frequency means nothing in
isolation -- you need to compare it to a reference block frequency, like
the frequency of a loop header, or of the entry block, or of something
else, for it to have any meaning.)

*Block bias* (PR20316 [1]): Is this basic block more or less likely to
be executed than its "peers"?  The current proposal calculates a
"reference" block frequency by assuming all branch weights are 1 (and,
thus, that all branch probabilities are even), and defines the block
bias to be the ratio of the block frequency to its "reference" block
frequency.  It may have some bitrot, but wouldn't be hard for me to
clean up and commit (if it's even a useful metric).

After I wrote this out, I remembered that I already documented this [2].
(If you think the descriptions I just gave here are more useful than
what's in tree, please improve the in-tree version!)

[1]: http://llvm.org/bugs/show_bug.cgi?id=20316
[2]: http://llvm.org/docs/BlockFrequencyTerminology.html

Daniel Berlin

2015-Jan-12 06:04 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

On Thu, Jan 8, 2015 at 2:37 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
> On 01/07/2015 05:33 PM, Chandler Carruth wrote:
>
>
> On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at
philipreames.com>
> wrote:
>
>> I've been playing with approaches to getting better optimization of
loops
>> which contain infrequently executed slow paths.  I've gotten as far
as
>> throwing together a proof of concept implementation of a profile guided
>> optimization to separate a single loop with multiple latches into a
loop
>> nest, but I want to get feedback from interested parties before
investing
>> much more effort.
>>
>> The primary case I'm concerned about looks something like this C
example:
>> while( some_condition )
>>   //do lots of work here
>>   if (!some_rare_condition) { <-- This is loop variant
>>     helper();
>>   }
>> }
>>
>> The key point here is that the 'do lots of work' section
contains things
>> we could lift out of the loop, but our knowledge of memory gets
completely
>> destoryed by the 'helper' call which doesn't get inlined. 
This ends up
>> crippling LICM in particular, and GVN to a lessor degree.
>>
>> The approach I've been playing with would create this loop
structure:
>> goto inner
>> while (true) {
>>   outer:
>>   helper();
>>   inner:
>>   while( some_condition )
>>     //do lots of work here
>>     if (!some_rare_condition) { <-- This is loop variant
>>       goto outer;
>>     }
>>   }
>> }
>>
>> (Excuse the psuedo-C.  Hopefully the idea is clear.)
>>
>
>  Yep.
>
>  I've not thought about this a lot, but I have two initial questions
that
> maybe you can answer:
>
>  How does this compare with classical approaches of loop peeling,
> partitioning, fission, or whatever you might call it? Is there any
> literature behind this approach or some literature it should be compared
> with? (I genuinely don't know this area that well, so I'm of little
help
> here...)
>
> I'm not aware of any literature that covers the specific transform
I'm
> suggesting here.  This is probably a lack of awareness on my part, not a
> lack of literature though.
>
http://webdocs.cs.ualberta.ca/~amaral/papers/BartonCC05.pdf

This is essentially index set splitting with the goal of it being an
enabling optimization for LICM.

(At least, as i've seen it implemented in other compilers)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/b62acf89/attachment.html>

Krzysztof Parzyszek

2015-Jan-12 18:55 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

On 1/12/2015 12:04 AM, Daniel Berlin wrote:>
> This is essentially index set splitting with the goal of it being an
> enabling optimization for LICM.
This is very different from index set splitting.  In ISS you know ahead 
of time into what ranges you can divide the values of the induction 
variable.  Here you don't.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

llvm dev - Jan 2015 - [LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?