KARTHIK VENKATESH BHAT
2015-Feb-05 13:34 UTC
[LLVMdev] [RFC] LoopInterchange Pass for llvm
Hi All,
I have been working on a loop interchange pass for llvm. The motivation is to
improve the cache hit to improve performance.
Would like to get your inputs on the same.
Currently this pass only handles loop of depth 2 other loops are ignored. Going
forward we would like to fix this.
This opt is disabled by default.
Patch: http://reviews.llvm.org/D7432
LoopInterchange Pass is divided into 3 parts-
1) LoopInterchangeLegality
2) LoopInterchangeProfitability
3) LoopInterchangeTransform
LoopInterchangeLegality:
This class checks all the memory instructions in the loop and uses
DependenceAnalysis to conclude if we can interchange the loop or not.
LoopInterchangeLegality Functions:
canInterchangeLoops - Checks if the loops can be interchanged.
checkDependence - Called by canInterchangeLoops. Does the actual
DependenceAnalysis to conclude if we can interchange the loops
currentLimiations - This function marks loops are illegal due to current
limitation in the way the transform is written. I intend to fix these issues.
LoopInterchangeProfitability:
This class checks if it is profitable to interchange the loops. Currently i use
only 1 heuristic which is the order in which the array elements are accessed
(i.e. row major/column major) and count the good and bad order. If we have bad
order more than good order we interchange. We can improve the heuristics here
later on.
LoopInterchangeProfitability Functions:
isProfitable - Concludes if it is profitable to interchange.
getInstrOrderCost - Calculates the array access order heuristics.
LoopInterchangeTransform:
This transforms the loop and interchanges the inner loop with the outer loop.
I'm writing a loop optimization for the first time and have few doubts here.
The way we have interchanged the loop is -
1) Split the inner loop header and move all phi nodes into a seperate block.
This will be the new outer header.
2) Split the loop latch at the indvar.next instruction( This may need
improvement as indicated in a TODO in the code) and this will be the new outer
loop latch.
3) Adjust the loop links by adjusting the branch instructions so that we move
the inner loop outside.
4) Fix PHi nodes due to the step 3 as previous basicblock from which it
branched will have changed.
After this step the loop is interchanged. It gives the correct results for few
of the sample loops which i checked but I'm not sure if this is the right
way to perform interchange transform.
I suppose i will have to update the dom tree etc? Could someone please guide me
if this is the right way to go forward or we need to transform in some other
way?
I checked the transform with the following and some other examples it seems to
interchange the loops when legal and profitable.I checked the o/p with and
without the transform.
They seem to be the same in the cases which i have checked.
#include <iostream>
using namespace std;
int N,M;
int A[100][100],B[100][100];
int k;
int main() {
cin >> N >> M;
for(int i=0;i<N;i++)
for(int j=0;j<M;j++)
cin >> A[i][j];
for(int j=0;j<N;j++) {
for(int i=0;i<M;i++) {
if(i%2)
A[i+2][j+2] = A[i][j]+k;
else
A[i][j+1] = A[i][j]+k;
}
}
for(int i=0;i<N;i++)
for(int j=0;j<M;j++)
cout << A[i][j] <<"\n";
return 0;
}
Running loop-interchange-
opt -basicaa -loop-interchange test.ll -o test.bc
Awaiting comments and inputs.
Thanks and Regards
Karthik Bhat