thr3ads.net - llvm dev - [llvm-dev] Implement Loop Fusion Pass [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Adam Nemet via llvm-dev

2016-Feb-24 02:18 UTC

[llvm-dev] Implement Loop Fusion Pass

> On Feb 22, 2016, at 6:27 AM, Vikram TV <vikram.tarikere at gmail.com>
wrote:
> 
> 
> 
> On Fri, Feb 19, 2016 at 10:46 PM, Vikram TV <vikram.tarikere at
gmail.com <mailto:vikram.tarikere at gmail.com>> wrote:
> Hi,
> 
> Thanks for the reply. Few thoughts inlined.
> 
> On Fri, Feb 19, 2016 at 8:00 AM, Adam Nemet <anemet at apple.com
<mailto:anemet at apple.com>> wrote:
> Hi Vikram,
> 
>> On Feb 18, 2016, at 9:21 AM, Vikram TV via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi all,
>> 
>> I have created a patch (up for review at:
http://reviews.llvm.org/D17386 <http://reviews.llvm.org/D17386>) that does
Loop Fusion implementation.
>> 
>> Approach:
>> Legality: Currently it can fuse two adjacent loops whose iteration
spaces are same and are at same depth.
>> 
>> Dependence legality: Currently, dependence legality cannot be checked
across loops. Hence the loops are cloned along a versioned path, unconditionally
fused along that path and then the dependence legality is checked on the fused
loop keeping the instructions from original loops in context. Fusion is illegal
if there is a backward dependence between memory accesses whose source was in
first loop and sink was in second loop.
>> Currently, LoopAccessAnalysis is used to check dependence legality.
> 
> Thanks for writing up the design here.
> 
> I think we have a pretty strong policy against creating temporary
instructions and here you actually create an entire loop just to check legality.
> I didn't understand the consequences here. A subsequent DCE pass or
explicit removal in this case is taking care of the temporaries. Any pointers in
this regard would be helpful.
Efficiency?  All you need for the legality is the dependences so why not analyze
those rather than recreate the entire underlying state from the lowest levels
(i.e. instructions) with a bunch of data structures and analyses on top that you
will all throw away at the end.

Here is one pointer:
http://article.gmane.org/gmane.comp.compilers.llvm.cvs/300603
<http://article.gmane.org/gmane.comp.compilers.llvm.cvs/300603>. You may
be able to find more by searching the archives.
> 
> It would probably be a better design to add the capability of adding two
LAI objects together.  This would effectively simulate the fusion on the
analysis side so you could query the legality from that.
> I am not sure how the underlying analysis like SCEV would behave in this
case. As per my understanding, it queries for a particular loop while we have
populated accesses from two different loops. But assuming that it works, we
would lose the ability to try/test using DependenceAnalysis in future.
Currently, it is very easy to replace LoopAccessAnalysis with
DependenceAnalysis.
Yes, the SCEV part could be problematic.  I am wondering if LAA could analyze
pointers on the same underlying object without calling SCEV’s getMinusSCEV(). 
E.g. deciding that the the dependence distance is 1 between {A, +, 1}<L1>
and {A+1, +, 1}<L2> assuming we added those two recurrences in the same
loop should not be hard.

I would certainly prefer keeping the heavy lifting for this outside of
ScalarEvolution which is already pretty complex.  We could of course still
refactor parts of SCEV if that it is helpful for LAA to work this out.

I am not sure I understand your LAA, DA argument.  They are already pretty
different (e.g. memchecks).  I see DA more of a drop-in for MemoryDepChecker
inside LAA.

Adam
> 
> Specifically, you could check if you have backward dependences between
instructions in L2 to instructions in L1 which would be illegal.
> 
> As a side effect you’d also get the total set of memchecks which you could
filter to only include checks where the participating pointers come from
different loops.  (This is quite similar to LoopDistribution.)
> I am happy to add a routine in a subsequent patch that filter the checks.
> Just to clarify, I meant to filter the runtime checks which is currently
not done in the patch.
> 
> Also I don’t think it should be too hard to teach LVer to be able to
version two consecutive loops (or arbitrary CFG?).
> I think yes. Instead of Loop Versioning deciding to version, code can be
factored out so that it versions unconditionally "also" as requested
by the pass that uses it.
> 
> Let me know what you think,
> Adam
> 
>> 
>> A basic diagram below tries to explain the approach taken to test
dependence legality on two adjacent loops (L1 and L2).
>> 
>>     L1PH        (PH: Preheader)
>>     |               
>>     L1              
>>     |               
>>     CB (L1Exit/L2PH: ConnectingBlock (CB) )
>>     |               
>>     L2              
>>     |               
>>     L2Exit
>> 
>> is versioned as:
>> 
>>     BooleanBB      
>>           /\          
>>  L1PH  L1PH.clone
>>          |     |         
>>       L1    L1.clone  
>>          |     |         
>>       CB    CB.clone  
>>          |     |         
>>       L2    L2.clone  
>>           \  /          
>>        L2Exit
>> 
>> And fused as:
>> 
>>   BooleanBB      
>>           /\
>>  L1PH  FusedPH
>>          |  |
>>       L1  L1Blocks
>>          |  |              \
>>      CB  L2Blocks |
>>          |  |             |/
>>       L2  |
>>          \ /
>>    CommonExit
>> 
>> Profitability: Yet to be added.
>> 
>> Further, based on legality and profitability success, the fused loop is
either retained or removed. If runtime checks are necessary, both original and
fused loops are retained; otherwise the original loops are removed.
>> 
>> Currently, I have scheduled the fusion pass after distribution pass.
Such a schedule negates the effect of the other pass, but given that the
distribution (and fusion) pass is experimental and off by default, I felt it was
okay to schedule that way till a global profitability is implemented.
>> 
>> Please share your feedback about the design and implementation.
>> 
>> Thank you
>> -- 
>> 
>> Good time...
>> Vikram TV
>> CompilerTree Technologies
>> Mysore, Karnataka, INDIA
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> 
> -- 
> 
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
> 
> 
> 
> -- 
> 
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160223/c8762c81/attachment-0001.html>

Vikram TV via llvm-dev

2016-Mar-02 16:44 UTC

head link

[llvm-dev] Implement Loop Fusion Pass

Hi,

I will try to prototype your idea to see how SCEV would behave and then get
back.

And sorry for replying very late! Was out on vacation.

Thank you

On Wed, Feb 24, 2016 at 7:48 AM, Adam Nemet <anemet at apple.com> wrote:
>
> On Feb 22, 2016, at 6:27 AM, Vikram TV <vikram.tarikere at gmail.com>
wrote:
>
>
>
> On Fri, Feb 19, 2016 at 10:46 PM, Vikram TV <vikram.tarikere at
gmail.com>
> wrote:
>
>> Hi,
>>
>> Thanks for the reply. Few thoughts inlined.
>>
>> On Fri, Feb 19, 2016 at 8:00 AM, Adam Nemet <anemet at apple.com>
wrote:
>>
>>> Hi Vikram,
>>>
>>> On Feb 18, 2016, at 9:21 AM, Vikram TV via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hi all,
>>>
>>> I have created a patch (up for review at:
http://reviews.llvm.org/D17386)
>>> that does Loop Fusion implementation.
>>>
>>> Approach:
>>> Legality: Currently it can fuse two adjacent loops whose iteration
>>> spaces are same and are at same depth.
>>>
>>> Dependence legality: Currently, dependence legality cannot be
checked
>>> across loops. Hence the loops are cloned along a versioned path,
>>> unconditionally fused along that path and then the dependence
legality is
>>> checked on the fused loop keeping the instructions from original
loops in
>>> context. Fusion is illegal if there is a backward dependence
between memory
>>> accesses whose source was in first loop and sink was in second
loop.
>>> Currently, LoopAccessAnalysis is used to check dependence legality.
>>>
>>>
>>> Thanks for writing up the design here.
>>>
>>> I think we have a pretty strong policy against creating temporary
>>> instructions and here you actually create an entire loop just to
check
>>> legality.
>>>
>> I didn't understand the consequences here. A subsequent DCE pass or
>> explicit removal in this case is taking care of the temporaries. Any
>> pointers in this regard would be helpful.
>>
>
> Efficiency?  All you need for the legality is the dependences so why not
> analyze those rather than recreate the entire underlying state from the
> lowest levels (i.e. instructions) with a bunch of data structures and
> analyses on top that you will all throw away at the end.
>
> Here is one pointer:
> http://article.gmane.org/gmane.comp.compilers.llvm.cvs/300603. You may be
> able to find more by searching the archives.
>
>
>>> It would probably be a better design to add the capability of
adding two
>>> LAI objects together.  This would effectively simulate the fusion
on the
>>> analysis side so you could query the legality from that.
>>>
>> I am not sure how the underlying analysis like SCEV would behave in
this
>> case. As per my understanding, it queries for a particular loop while
we
>> have populated accesses from two different loops. But assuming that it
>> works, we would lose the ability to try/test using DependenceAnalysis
in
>> future. Currently, it is very easy to replace LoopAccessAnalysis with
>> DependenceAnalysis.
>>
>
> Yes, the SCEV part could be problematic.  I am wondering if LAA could
> analyze pointers on the same underlying object without calling SCEV’s
> getMinusSCEV().  E.g. deciding that the the dependence distance is 1
> between {A, +, 1}<L1> and {A+1, +, 1}<L2> assuming we added
those two
> recurrences in the same loop should not be hard.
>
> I would certainly prefer keeping the heavy lifting for this outside of
> ScalarEvolution which is already pretty complex.  We could of course still
> refactor parts of SCEV if that it is helpful for LAA to work this out.
>
> I am not sure I understand your LAA, DA argument.  They are already pretty
> different (e.g. memchecks).  I see DA more of a drop-in for
> MemoryDepChecker inside LAA.
>
> Adam
>
>
>>> Specifically, you could check if you have backward dependences
between
>>> instructions in L2 to instructions in L1 which would be illegal.
>>>
>>> As a side effect you’d also get the total set of memchecks which
you
>>> could filter to only include checks where the participating
pointers come
>>> from different loops.  (This is quite similar to LoopDistribution.)
>>>
>> I am happy to add a routine in a subsequent patch that filter the
checks.
>>
> Just to clarify, I meant to filter the runtime checks which is currently
> not done in the patch.
>
>>
>>> Also I don’t think it should be too hard to teach LVer to be able
to
>>> version two consecutive loops (or arbitrary CFG?).
>>>
>> I think yes. Instead of Loop Versioning deciding to version, code can
be
>> factored out so that it versions unconditionally "also" as
requested by the
>> pass that uses it.
>>
>>>
>>> Let me know what you think,
>>> Adam
>>>
>>>
>>> A basic diagram below tries to explain the approach taken to test
>>> dependence legality on two adjacent loops (L1 and L2).
>>>
>>>     L1PH        (PH: Preheader)
>>>     |
>>>     L1
>>>     |
>>>     CB (L1Exit/L2PH: ConnectingBlock (CB) )
>>>     |
>>>     L2
>>>     |
>>>     L2Exit
>>>
>>> is versioned as:
>>>
>>>     BooleanBB
>>>           /\
>>>  L1PH  L1PH.clone
>>>          |     |
>>>       L1    L1.clone
>>>          |     |
>>>       CB    CB.clone
>>>          |     |
>>>       L2    L2.clone
>>>           \  /
>>>        L2Exit
>>>
>>> And fused as:
>>>
>>>   BooleanBB
>>>           /\
>>>  L1PH  FusedPH
>>>          |  |
>>>       L1  L1Blocks
>>>          |  |              \
>>>      CB  L2Blocks |
>>>          |  |             |/
>>>       L2  |
>>>          \ /
>>>    CommonExit
>>>
>>> Profitability: Yet to be added.
>>>
>>> Further, based on legality and profitability success, the fused
loop is
>>> either retained or removed. If runtime checks are necessary, both
original
>>> and fused loops are retained; otherwise the original loops are
removed.
>>>
>>> Currently, I have scheduled the fusion pass after distribution
pass.
>>> Such a schedule negates the effect of the other pass, but given
that the
>>> distribution (and fusion) pass is experimental and off by default,
I felt
>>> it was okay to schedule that way till a global profitability is
implemented.
>>>
>>> Please share your feedback about the design and implementation.
>>>
>>> Thank you
>>> --
>>>
>>> Good time...
>>> Vikram TV
>>> CompilerTree Technologies
>>> Mysore, Karnataka, INDIA
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>
>>
>> --
>>
>> Good time...
>> Vikram TV
>> CompilerTree Technologies
>> Mysore, Karnataka, INDIA
>>
>
>
>
> --
>
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
>
>
>

-- 

Good time...
Vikram TV
CompilerTree Technologies
Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160302/a1e7cb89/attachment.html>

Adam Nemet via llvm-dev

2016-Mar-02 20:45 UTC

head link

[llvm-dev] Implement Loop Fusion Pass

> On Mar 2, 2016, at 8:44 AM, Vikram TV <vikram.tarikere at gmail.com>
wrote:
> 
> Hi,
> 
> I will try to prototype your idea to see how SCEV would behave and then get
back.
Great!  Let me know if you need help with anything.  I would be great to get
loop fusion supported in LLVM.

Adam
> And sorry for replying very late! Was out on vacation.
> 
> Thank you
> 
> On Wed, Feb 24, 2016 at 7:48 AM, Adam Nemet <anemet at apple.com
<mailto:anemet at apple.com>> wrote:
> 
>> On Feb 22, 2016, at 6:27 AM, Vikram TV <vikram.tarikere at gmail.com
<mailto:vikram.tarikere at gmail.com>> wrote:
>> 
>> 
>> 
>> On Fri, Feb 19, 2016 at 10:46 PM, Vikram TV <vikram.tarikere at
gmail.com <mailto:vikram.tarikere at gmail.com>> wrote:
>> Hi,
>> 
>> Thanks for the reply. Few thoughts inlined.
>> 
>> On Fri, Feb 19, 2016 at 8:00 AM, Adam Nemet <anemet at apple.com
<mailto:anemet at apple.com>> wrote:
>> Hi Vikram,
>> 
>>> On Feb 18, 2016, at 9:21 AM, Vikram TV via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have created a patch (up for review at:
http://reviews.llvm.org/D17386 <http://reviews.llvm.org/D17386>) that does
Loop Fusion implementation.
>>> 
>>> Approach:
>>> Legality: Currently it can fuse two adjacent loops whose iteration
spaces are same and are at same depth.
>>> 
>>> Dependence legality: Currently, dependence legality cannot be
checked across loops. Hence the loops are cloned along a versioned path,
unconditionally fused along that path and then the dependence legality is
checked on the fused loop keeping the instructions from original loops in
context. Fusion is illegal if there is a backward dependence between memory
accesses whose source was in first loop and sink was in second loop.
>>> Currently, LoopAccessAnalysis is used to check dependence legality.
>> 
>> Thanks for writing up the design here.
>> 
>> I think we have a pretty strong policy against creating temporary
instructions and here you actually create an entire loop just to check legality.
>> I didn't understand the consequences here. A subsequent DCE pass or
explicit removal in this case is taking care of the temporaries. Any pointers in
this regard would be helpful.
> 
> Efficiency?  All you need for the legality is the dependences so why not
analyze those rather than recreate the entire underlying state from the lowest
levels (i.e. instructions) with a bunch of data structures and analyses on top
that you will all throw away at the end.
> 
> Here is one pointer:
http://article.gmane.org/gmane.comp.compilers.llvm.cvs/300603
<http://article.gmane.org/gmane.comp.compilers.llvm.cvs/300603>. You may
be able to find more by searching the archives.
> 
>> 
>> It would probably be a better design to add the capability of adding
two LAI objects together.  This would effectively simulate the fusion on the
analysis side so you could query the legality from that.
>> I am not sure how the underlying analysis like SCEV would behave in
this case. As per my understanding, it queries for a particular loop while we
have populated accesses from two different loops. But assuming that it works, we
would lose the ability to try/test using DependenceAnalysis in future.
Currently, it is very easy to replace LoopAccessAnalysis with
DependenceAnalysis.
> 
> Yes, the SCEV part could be problematic.  I am wondering if LAA could
analyze pointers on the same underlying object without calling SCEV’s
getMinusSCEV().  E.g. deciding that the the dependence distance is 1 between {A,
+, 1}<L1> and {A+1, +, 1}<L2> assuming we added those two
recurrences in the same loop should not be hard.
> 
> I would certainly prefer keeping the heavy lifting for this outside of
ScalarEvolution which is already pretty complex.  We could of course still
refactor parts of SCEV if that it is helpful for LAA to work this out.
> 
> I am not sure I understand your LAA, DA argument.  They are already pretty
different (e.g. memchecks).  I see DA more of a drop-in for MemoryDepChecker
inside LAA.
> 
> Adam
> 
>> 
>> Specifically, you could check if you have backward dependences between
instructions in L2 to instructions in L1 which would be illegal.
>> 
>> As a side effect you’d also get the total set of memchecks which you
could filter to only include checks where the participating pointers come from
different loops.  (This is quite similar to LoopDistribution.)
>> I am happy to add a routine in a subsequent patch that filter the
checks.
>> Just to clarify, I meant to filter the runtime checks which is
currently not done in the patch.
>> 
>> Also I don’t think it should be too hard to teach LVer to be able to
version two consecutive loops (or arbitrary CFG?).
>> I think yes. Instead of Loop Versioning deciding to version, code can
be factored out so that it versions unconditionally "also" as
requested by the pass that uses it.
>> 
>> Let me know what you think,
>> Adam
>> 
>>> 
>>> A basic diagram below tries to explain the approach taken to test
dependence legality on two adjacent loops (L1 and L2).
>>> 
>>>     L1PH        (PH: Preheader)
>>>     |               
>>>     L1              
>>>     |               
>>>     CB (L1Exit/L2PH: ConnectingBlock (CB) )
>>>     |               
>>>     L2              
>>>     |               
>>>     L2Exit
>>> 
>>> is versioned as:
>>> 
>>>     BooleanBB      
>>>           /\          
>>>  L1PH  L1PH.clone
>>>          |     |         
>>>       L1    L1.clone  
>>>          |     |         
>>>       CB    CB.clone  
>>>          |     |         
>>>       L2    L2.clone  
>>>           \  /          
>>>        L2Exit
>>> 
>>> And fused as:
>>> 
>>>   BooleanBB      
>>>           /\
>>>  L1PH  FusedPH
>>>          |  |
>>>       L1  L1Blocks
>>>          |  |              \
>>>      CB  L2Blocks |
>>>          |  |             |/
>>>       L2  |
>>>          \ /
>>>    CommonExit
>>> 
>>> Profitability: Yet to be added.
>>> 
>>> Further, based on legality and profitability success, the fused
loop is either retained or removed. If runtime checks are necessary, both
original and fused loops are retained; otherwise the original loops are removed.
>>> 
>>> Currently, I have scheduled the fusion pass after distribution
pass. Such a schedule negates the effect of the other pass, but given that the
distribution (and fusion) pass is experimental and off by default, I felt it was
okay to schedule that way till a global profitability is implemented.
>>> 
>>> Please share your feedback about the design and implementation.
>>> 
>>> Thank you
>>> -- 
>>> 
>>> Good time...
>>> Vikram TV
>>> CompilerTree Technologies
>>> Mysore, Karnataka, INDIA
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Good time...
>> Vikram TV
>> CompilerTree Technologies
>> Mysore, Karnataka, INDIA
>> 
>> 
>> 
>> -- 
>> 
>> Good time...
>> Vikram TV
>> CompilerTree Technologies
>> Mysore, Karnataka, INDIA
> 
> 
> 
> 
> -- 
> 
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160302/70d39003/attachment.html>

llvm dev - Mar 2016 - Implement Loop Fusion Pass

[llvm-dev] Implement Loop Fusion Pass

[llvm-dev] Implement Loop Fusion Pass

[llvm-dev] Implement Loop Fusion Pass