Is there a transformation in LLVM that will perform loop fusion?
http://en.wikipedia.org/wiki/Loop_fusion
I have the following program, in which I would like the 2 loops
(iterating the same number of times) to be merged into 1, after which
other nice optimizations such as mem2reg will apply:
; ModuleID = 'test'
define void @vector([16 x float]* nocapture %arg, [16 x float]*
nocapture %ret) {
bb.nph12:
%0 = alloca [16 x float], align 4 ; <[16 x float]*>
[#uses=2]
br label %loop
loop: ; preds = %loop,
%bb.nph12
%indvar13 = phi i64 [ 0, %bb.nph12 ], [ %indvar.next14, %loop ] ;
<i64> [#uses=3]
%gep = getelementptr [16 x float]* %ret, i64 0, i64 %indvar13 ;
<float*> [#uses=1]
%gep1 = getelementptr [16 x float]* %0, i64 0, i64 %indvar13 ;
<float*> [#uses=1]
%load = load float* %gep ; <float> [#uses=1]
store float %load, float* %gep1
%indvar.next14 = add i64 %indvar13, 1 ; <i64> [#uses=2]
%exitcond15 = icmp eq i64 %indvar.next14, 16 ; <i1> [#uses=1]
br i1 %exitcond15, label %loop3, label %loop
loop3: ; preds = %loop, %loop3
%indvar = phi i64 [ %indvar.next, %loop3 ], [ 0, %loop ] ; <i64>
[#uses=3]
%gep6 = getelementptr [16 x float]* %0, i64 0, i64 %indvar ; <float*>
[#uses=1]
%gep8 = getelementptr [16 x float]* %arg, i64 0, i64 %indvar ;
<float*> [#uses=1]
%load7 = load float* %gep6 ; <float> [#uses=1]
store float %load7, float* %gep8
%indvar.next = add i64 %indvar, 1 ; <i64> [#uses=2]
%exitcond = icmp eq i64 %indvar.next, 16 ; <i1> [#uses=1]
br i1 %exitcond, label %end4, label %loop3
end4: ; preds = %loop3
ret void
}
I did find this note from 2004 that references a LoopFusion pass, but I can't find it in the source code: http://nondot.org/sabre/LLVMNotes/LoopOptimizerNotes.txt A little background - I'm working on a SIMD runtime, where I have a scalar program but need to execute it on multiple independent data elements. One approach is to generate a loop for each operation, then rely on the optimizer to merge the loops so that I get good locality. Does this sound like a reasonable approach in LLVM? If you're familiar with CUDA, I'm after something quite similar where the program is scalar but the runtime is vectorized. Any pointers in the right direction would be appreciated! Andrew On 09/07/2010 03:12 PM, Andrew Clinton wrote:> Is there a transformation in LLVM that will perform loop fusion? > http://en.wikipedia.org/wiki/Loop_fusion > > I have the following program, in which I would like the 2 loops > (iterating the same number of times) to be merged into 1, after which > other nice optimizations such as mem2reg will apply: > > ; ModuleID = 'test' > > define void @vector([16 x float]* nocapture %arg, [16 x float]* > nocapture %ret) { > bb.nph12: > %0 = alloca [16 x float], align 4 ;<[16 x float]*> > [#uses=2] > br label %loop > > loop: ; preds = %loop, > %bb.nph12 > %indvar13 = phi i64 [ 0, %bb.nph12 ], [ %indvar.next14, %loop ] ; > <i64> [#uses=3] > %gep = getelementptr [16 x float]* %ret, i64 0, i64 %indvar13 ; > <float*> [#uses=1] > %gep1 = getelementptr [16 x float]* %0, i64 0, i64 %indvar13 ; > <float*> [#uses=1] > %load = load float* %gep ;<float> [#uses=1] > store float %load, float* %gep1 > %indvar.next14 = add i64 %indvar13, 1 ;<i64> [#uses=2] > %exitcond15 = icmp eq i64 %indvar.next14, 16 ;<i1> [#uses=1] > br i1 %exitcond15, label %loop3, label %loop > > loop3: ; preds = %loop, %loop3 > %indvar = phi i64 [ %indvar.next, %loop3 ], [ 0, %loop ] ;<i64> > [#uses=3] > %gep6 = getelementptr [16 x float]* %0, i64 0, i64 %indvar ;<float*> > [#uses=1] > %gep8 = getelementptr [16 x float]* %arg, i64 0, i64 %indvar ; > <float*> [#uses=1] > %load7 = load float* %gep6 ;<float> [#uses=1] > store float %load7, float* %gep8 > %indvar.next = add i64 %indvar, 1 ;<i64> [#uses=2] > %exitcond = icmp eq i64 %indvar.next, 16 ;<i1> [#uses=1] > br i1 %exitcond, label %end4, label %loop3 > > end4: ; preds = %loop3 > ret void > } > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Andrew, There is not any transformation in LLVM that does loop fusion. I do not of anyone who is working on this. If you're interested to work on it then it'd be great! - Devang On Sep 8, 2010, at 9:34 AM, Andrew Clinton wrote:> I did find this note from 2004 that references a LoopFusion pass, but I > can't find it in the source code: > http://nondot.org/sabre/LLVMNotes/LoopOptimizerNotes.txt > > A little background - I'm working on a SIMD runtime, where I have a > scalar program but need to execute it on multiple independent data > elements. One approach is to generate a loop for each operation, then > rely on the optimizer to merge the loops so that I get good locality. > Does this sound like a reasonable approach in LLVM? If you're familiar > with CUDA, I'm after something quite similar where the program is scalar > but the runtime is vectorized. Any pointers in the right direction > would be appreciated! > > Andrew >