Vikram TV via llvm-dev
2016-Jun-23 19:06 UTC
[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
On Thu, Jun 23, 2016 at 11:34 PM, Adam Nemet <anemet at apple.com> wrote:> > > >> Hi Vikram, >> >> Is the analysis result specific to a loop nest or to a loop nest together >> with a set of reference groups? >> > The result is specific to each loop in the loop nest and the calculations > are based on the references in the loop nest. > > > Sorry can you please rephrase/elaborate, I don’t understand what you > mean. Analysis results are retained unless a transformation invalidates > them. My question is how the reference groups affect all this. >Sorry, I now understood that you meant llvm's analysis result (I thought how the analysis calculates the result). The analysis result is specific to each loop in its loop nest. Reference groups are necessary during cost computation only.> > You could probably describe how you envision this analysis would be used > with something like Loop Fusion. >For Loop Interchange, the cost calculated for each loop can provide a desired ordering of the loops. Loop interchange can start interchanging the loops to match the desired order iff interchange legality check passes. Eg: For matrix multiplication case, for (i = 0 to 5000) for (j = 0 to 5000) for (k = 0 to 5000) C[i,j] = C[i,j] + A[i,k]* B[k,j] Here, based on cache line size and the cache penalties for the references (please refer to first mail of the thread or the reference paper for different penalties), following costs result: Loop i cost: 2.501750e+11 Loop j cost: 6.256252e+10 Loop k cost: 1.563688e+11 Loop cost here is nothing but the total cache penalities due to all references, including penalties due to outer loops. So lower costing loop is a better fit for innermost loop because it has better cache locality. This would result in the desired loop order: i, k, j; and LoopInterchange can interchange 'j' and 'k' loops given the legality check passes (a nearest ordering for which legality is correct is also an optimal loop ordering). For fusion/fission, loop costs due to cache can be calculated by assigning cache penalties for references with respect to loops before and after fusion/fission. The costs will be a profitability measure for fusion/fission. I think use case description was very brief in a previous mail. So I have elaborated this time. More details can also be found in http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf. Thanks Good time... Vikram TV CompilerTree Technologies Mysore, Karnataka, INDIA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160624/ce6ec1a2/attachment.html>
Vikram TV via llvm-dev
2016-Jul-05 01:47 UTC
[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
On Fri, Jun 24, 2016 at 12:36 AM, Vikram TV <vikram.tarikere at gmail.com> wrote:> > > On Thu, Jun 23, 2016 at 11:34 PM, Adam Nemet <anemet at apple.com> wrote: > >> >> >> >>> Hi Vikram, >>> >>> Is the analysis result specific to a loop nest or to a loop nest >>> together with a set of reference groups? >>> >> The result is specific to each loop in the loop nest and the calculations >> are based on the references in the loop nest. >> >> >> Sorry can you please rephrase/elaborate, I don’t understand what you >> mean. Analysis results are retained unless a transformation invalidates >> them. My question is how the reference groups affect all this. >> > > Sorry, I now understood that you meant llvm's analysis result (I thought > how the analysis calculates the result). The analysis result is specific to > each loop in its loop nest. Reference groups are necessary during cost > computation only. >Hi Adam, Do you have any concerns/suggestions regarding the analysis results? I just want to reiterate that the analysis calculates the cost for each loop in its loop nest. Reference groups affect this only during computation of the loop cost i.e. in order to compute the cost, the analysis need to analyze each of the references in the nest. Please do let me know. Thanks! -- Good time... Vikram TV CompilerTree Technologies Mysore, Karnataka, INDIA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160705/ad31ba55/attachment.html>
Adam Nemet via llvm-dev
2016-Jul-05 18:09 UTC
[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
> On Jun 23, 2016, at 12:06 PM, Vikram TV <vikram.tarikere at gmail.com> wrote: > > > > On Thu, Jun 23, 2016 at 11:34 PM, Adam Nemet <anemet at apple.com <mailto:anemet at apple.com>> wrote: > >> >> >> Hi Vikram, >> >> Is the analysis result specific to a loop nest or to a loop nest together with a set of reference groups? >> The result is specific to each loop in the loop nest and the calculations are based on the references in the loop nest. > > Sorry can you please rephrase/elaborate, I don’t understand what you mean. Analysis results are retained unless a transformation invalidates them. My question is how the reference groups affect all this. > > Sorry, I now understood that you meant llvm's analysis result (I thought how the analysis calculates the result). The analysis result is specific to each loop in its loop nest. Reference groups are necessary during cost computation only.Hi Vikram, Let me ask you differently. This analysis can be thought of as having two inputs, (1) a loop or loops (?) and (2) the reference group. If a transformation pass requires some analysis, the analysis result would have to be computed unless it has already been computed and no transformation has invalidated it since the last use. My question how does this apply to this pass. Consider this sequence of passes: 1. loop fusion of loops L1 and L2 Computes Loop Cost for L1 and L2 with the reference group that includes accesses from the both groups. Let’s assume here that the transformation does not occur at the end so the analysis result is still valid at this point. 2. loop distribution/fission for loop L1 Does this have to recompute the Loop Cost for L1 because now the reference group only includes accesses from L1? To put it different my question what the tag/key is when caching the analysis result. Is it the loop only or also the reference group. Adam> > You could probably describe how you envision this analysis would be used with something like Loop Fusion. > > For Loop Interchange, the cost calculated for each loop can provide a desired ordering of the loops. Loop interchange can start interchanging the loops to match the desired order iff interchange legality check passes. > Eg: For matrix multiplication case, > for (i = 0 to 5000) > for (j = 0 to 5000) > for (k = 0 to 5000) > C[i,j] = C[i,j] + A[i,k]* B[k,j] > Here, based on cache line size and the cache penalties for the references (please refer to first mail of the thread or the reference paper for different penalties), following costs result: > Loop i cost: 2.501750e+11 > Loop j cost: 6.256252e+10 > Loop k cost: 1.563688e+11 > Loop cost here is nothing but the total cache penalities due to all references, including penalties due to outer loops. So lower costing loop is a better fit for innermost loop because it has better cache locality. This would result in the desired loop order: i, k, j; and LoopInterchange can interchange 'j' and 'k' loops given the legality check passes (a nearest ordering for which legality is correct is also an optimal loop ordering). > > For fusion/fission, loop costs due to cache can be calculated by assigning cache penalties for references with respect to loops before and after fusion/fission. The costs will be a profitability measure for fusion/fission. > > I think use case description was very brief in a previous mail. So I have elaborated this time. More details can also be found in http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf <http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf>. > > Thanks > > Good time... > Vikram TV > CompilerTree Technologies > Mysore, Karnataka, INDIA-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160705/f108fb5e/attachment.html>
Vikram TV via llvm-dev
2016-Sep-07 10:56 UTC
[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
Hi Adam, First, I sincerely apologize for stalling this thread. I understand your point about loop nest and reference groups. Both loop nest and the reference groups form the key to cache the analysis result. In this case, isn't it better not to cache anything and just compute the cost dynamically; based on loop nest and references provided as input by the calling transformation (fission/fusion/etc)? Thank you On Tue, Jul 5, 2016 at 11:39 PM, Adam Nemet <anemet at apple.com> wrote:> > On Jun 23, 2016, at 12:06 PM, Vikram TV <vikram.tarikere at gmail.com> wrote: > > > > On Thu, Jun 23, 2016 at 11:34 PM, Adam Nemet <anemet at apple.com> wrote: > >> >> >> >>> Hi Vikram, >>> >>> Is the analysis result specific to a loop nest or to a loop nest >>> together with a set of reference groups? >>> >> The result is specific to each loop in the loop nest and the calculations >> are based on the references in the loop nest. >> >> >> Sorry can you please rephrase/elaborate, I don’t understand what you >> mean. Analysis results are retained unless a transformation invalidates >> them. My question is how the reference groups affect all this. >> > > Sorry, I now understood that you meant llvm's analysis result (I thought > how the analysis calculates the result). The analysis result is specific to > each loop in its loop nest. Reference groups are necessary during cost > computation only. > > > Hi Vikram, > > Let me ask you differently. This analysis can be thought of as having two > inputs, (1) a loop or loops (?) and (2) the reference group. If a > transformation pass requires some analysis, the analysis result would have > to be computed unless it has already been computed and no transformation > has invalidated it since the last use. > > My question how does this apply to this pass. Consider this sequence of > passes: > > 1. loop fusion of loops L1 and L2 > > Computes Loop Cost for L1 and L2 with the reference group that includes > accesses from the both groups. > > Let’s assume here that the transformation does not occur at the end so the > analysis result is still valid at this point. > > > 2. loop distribution/fission for loop L1 > > Does this have to recompute the Loop Cost for L1 because now the reference > group only includes accesses from L1? > > To put it different my question what the tag/key is when caching the > analysis result. Is it the loop only or also the reference group. > > Adam > > >> You could probably describe how you envision this analysis would be used >> with something like Loop Fusion. >> > > For Loop Interchange, the cost calculated for each loop can provide a > desired ordering of the loops. Loop interchange can start interchanging the > loops to match the desired order iff interchange legality check passes. > Eg: For matrix multiplication case, > for (i = 0 to 5000) > for (j = 0 to 5000) > for (k = 0 to 5000) > C[i,j] = C[i,j] + A[i,k]* B[k,j] > Here, based on cache line size and the cache penalties for the > references (please refer to first mail of the thread or the reference paper > for different penalties), following costs result: > Loop i cost: 2.501750e+11 > Loop j cost: 6.256252e+10 > Loop k cost: 1.563688e+11 > Loop cost here is nothing but the total cache penalities due to all > references, including penalties due to outer loops. So lower costing loop > is a better fit for innermost loop because it has better cache locality. > This would result in the desired loop order: i, k, j; and LoopInterchange > can interchange 'j' and 'k' loops given the legality check passes (a > nearest ordering for which legality is correct is also an optimal loop > ordering). > > For fusion/fission, loop costs due to cache can be calculated by assigning > cache penalties for references with respect to loops before and after > fusion/fission. The costs will be a profitability measure for > fusion/fission. > > I think use case description was very brief in a previous mail. So I have > elaborated this time. More details can also be found in > http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf. > > Thanks > > Good time... > Vikram TV > CompilerTree Technologies > Mysore, Karnataka, INDIA > > >-- Good time... Vikram TV CompilerTree Technologies Mysore, Karnataka, INDIA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160907/680179c5/attachment.html>