thr3ads.net - llvm dev - [LLVMdev] Separating loop nests based on profile information? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2015-Jan-13 03:39 UTC

[LLVMdev] Separating loop nests based on profile information?

On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at google.com>
wrote:
>
> On Wed, Jan 7, 2015 at 5:19 PM, Philip Reames <listmail at
philipreames.com>
> wrote:
>
>> I've been playing with approaches to getting better optimization of
loops
>> which contain infrequently executed slow paths.  I've gotten as far
as
>> throwing together a proof of concept implementation of a profile guided
>> optimization to separate a single loop with multiple latches into a
loop
>> nest, but I want to get feedback from interested parties before
investing
>> much more effort.
>>
>> The primary case I'm concerned about looks something like this C
example:
>> while( some_condition )
>>   //do lots of work here
>>   if (!some_rare_condition) { <-- This is loop variant
>>     helper();
>>   }
>> }
>>
>> The key point here is that the 'do lots of work' section
contains things
>> we could lift out of the loop, but our knowledge of memory gets
completely
>> destoryed by the 'helper' call which doesn't get inlined. 
This ends up
>> crippling LICM in particular, and GVN to a lessor degree.
>>
>> The approach I've been playing with would create this loop
structure:
>> goto inner
>> while (true) {
>>   outer:
>>   helper();
>>   inner:
>>   while( some_condition )
>>     //do lots of work here
>>     if (!some_rare_condition) { <-- This is loop variant
>>       goto outer;
>>     }
>>   }
>> }
>>
>> (Excuse the psuedo-C.  Hopefully the idea is clear.)
>>
>
> Yep.
>
> I've not thought about this a lot, but I have two initial questions
that
> maybe you can answer:
>
> How does this compare with classical approaches of loop peeling,
> partitioning, fission, or whatever you might call it? Is there any
> literature behind this approach or some literature it should be compared
> with? (I genuinely don't know this area that well, so I'm of little
help
> here...)
>

It is not any of those loop transformations. It does not even change the
control flow. It is more a code layout optimization -- move the cold trace
in the loop out of the body.

>
>
> Some of your points I have quick feedback on:
>
>> Points for discussion:
>>
>>    - Is using profile information for this purpose even a reasonable
>>    thing to do?
>>
>> Yes!
>
>
>>
>>    - I chose to implement this without relying on the existing block
>>    frequency analysis. My reasoning was that a) this is a rarely taken
case
>>    and adding an expensive analysis dependency probably wasn't
worthwhile and
>>    b) that block frequency analysis was more expensive/precise than I
really
>>    needed. Is this reasonable?
>>
>> I think we should always use the analyses. Either BlockFrequency or
> BranchProbability. I think probably both in the common joint usage
> (frequency of the loop header combined with probability of the cold
region).
>
yes.

>
>>    - If so, is the notion of 'rareness' of a loop block
something that's
>>    worth extracting out on it's own and reusing? Are there other
similar uses
>>    anyone can think of?
>>    - Currently, I'm only supporting a fairly small set of
controlling
>>    conditions. Are there important cases I'm not considering?
>>
>> To both of these, I think the general combination to use is to identify
> the set of blocks dominated by a block which is in the loop body of a hot
> loop, and is cold relative to the other successors of its predecessor(s).
> These form cold "regions" as I think of them without requiring
the
> complexity of the region analysis.
>
>
Static prediction should handle it  --  call heuristics or some combination
with other heuristics (error handler recognition etc).


David

>
>>    - Since the rarest latch is often deep in a loop - with other
"if (X)
>>    continue;" (i.e. latches) before it - this tends to create
loops with
>>    multiple exiting blocks. Some of the existing passes might not deal
with
>>    this well, is that a major concern? Suggestions for how to analysis
and
>>    validate?
>>
>> I'm somewhat concerned about this, but want to think more about the
> fundamental transformation.
>
>
>>
>>    - Currently, I've structured this as pulling off the rarest
latch as
>>    an outer iteration. I could also pull off the most popular latch as
an
>>    inner iteration. This might give different tradeoffs; thoughts?
>>
>> Generally, any thoughts anyone have on the problem or approach are
>> welcome. I'm not particular attached to the particular approach
laid out
>> here and if there's a more advantageous approach, all the better.
>>
> Thanks for pushing on this! ;] Now I need to go and ponder a lot so i can
> reply more deeply on the actual transform.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/69bd8bbb/attachment.html>

Philip Reames

2015-Jan-20 01:43 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

On 01/12/2015 07:39 PM, Xinliang David Li wrote:>
>
> On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at
google.com
> <mailto:chandlerc at google.com>> wrote:
>
>
>
>     How does this compare with classical approaches of loop peeling,
>     partitioning, fission, or whatever you might call it? Is there any
>     literature behind this approach or some literature it should be
>     compared with? (I genuinely don't know this area that well, so
I'm
>     of little help here...)
>
>
>
> It is not any of those loop transformations. It does not even change 
> the control flow. It is more a code layout optimization -- move the 
> cold trace in the loop out of the body.It's also a somewhat major change in the canonical form of the loop as 
seen by the optimizer.  That's probably the more important part in 
practice.>
>
>           * If so, is the notion of 'rareness' of a loop block
>             something that's worth extracting out on it's own and
>             reusing? Are there other similar uses anyone can think of?
>           * Currently, I'm only supporting a fairly small set of
>             controlling conditions. Are there important cases I'm not
>             considering?
>
>     To both of these, I think the general combination to use is to
>     identify the set of blocks dominated by a block which is in the
>     loop body of a hot loop, and is cold relative to the other
>     successors of its predecessor(s). These form cold "regions"
as I
>     think of them without requiring the complexity of the region analysis.
>
>
> Static prediction should handle it  --  call heuristics or some 
> combination with other heuristics (error handler recognition etc).
>Sorry, I'm really not sure what you're trying to say here.  Could you 
clarify/expand?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/fe849543/attachment.html>

Xinliang David Li

2015-Jan-20 04:48 UTC

head link

[LLVMdev] Separating loop nests based on profile information?

On Mon, Jan 19, 2015 at 5:43 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
> On 01/12/2015 07:39 PM, Xinliang David Li wrote:
>
>
>
> On Wed, Jan 7, 2015 at 5:33 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
>>
>>
>> How does this compare with classical approaches of loop peeling,
>> partitioning, fission, or whatever you might call it? Is there any
>> literature behind this approach or some literature it should be
compared
>> with? (I genuinely don't know this area that well, so I'm of
little help
>> here...)
>>
>
>
>  It is not any of those loop transformations. It does not even change the
> control flow. It is more a code layout optimization -- move the cold trace
> in the loop out of the body.
>
> It's also a somewhat major change in the canonical form of the loop as
> seen by the optimizer.  That's probably the more important part in
> practice.
>
right.
>
>
>
>>>    - If so, is the notion of 'rareness' of a loop block
something
>>>    that's worth extracting out on it's own and reusing? Are
there other
>>>    similar uses anyone can think of?
>>>    - Currently, I'm only supporting a fairly small set of
controlling
>>>    conditions. Are there important cases I'm not considering?
>>>
>>>    To both of these, I think the general combination to use is to
>> identify the set of blocks dominated by a block which is in the loop
body
>> of a hot loop, and is cold relative to the other successors of its
>> predecessor(s). These form cold "regions" as I think of them
without
>> requiring the complexity of the region analysis.
>>
>>
>
>  Static prediction should handle it  --  call heuristics or some
> combination with other heuristics (error handler recognition etc).
>
>    Sorry, I'm really not sure what you're trying to say here. 
Could you
> clarify/expand?
>
It is more of a comment about identify rare blocks -- there are cases where
static heuristics can produce branch prediction with high confidence.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150119/ad77821b/attachment.html>

llvm dev - Jan 2015 - [LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?

[LLVMdev] Separating loop nests based on profile information?