thr3ads.net - llvm dev - [LLVMdev] loop fusion [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Andrew Clinton

2010-Sep-07 19:12 UTC

[LLVMdev] loop fusion

Is there a transformation in LLVM that will perform loop fusion?
http://en.wikipedia.org/wiki/Loop_fusion

I have the following program, in which I would like the 2 loops 
(iterating the same number of times) to be merged into 1, after which 
other nice optimizations such as mem2reg will apply:

; ModuleID = 'test'

define void @vector([16 x float]* nocapture %arg, [16 x float]* 
nocapture %ret) {
bb.nph12:
   %0 = alloca [16 x float], align 4               ; <[16 x float]*> 
[#uses=2]
   br label %loop

loop:                                             ; preds = %loop, 
%bb.nph12
   %indvar13 = phi i64 [ 0, %bb.nph12 ], [ %indvar.next14, %loop ] ; 
<i64> [#uses=3]
   %gep = getelementptr [16 x float]* %ret, i64 0, i64 %indvar13 ; 
<float*> [#uses=1]
   %gep1 = getelementptr [16 x float]* %0, i64 0, i64 %indvar13 ; 
<float*> [#uses=1]
   %load = load float* %gep                        ; <float> [#uses=1]
   store float %load, float* %gep1
   %indvar.next14 = add i64 %indvar13, 1           ; <i64> [#uses=2]
   %exitcond15 = icmp eq i64 %indvar.next14, 16    ; <i1> [#uses=1]
   br i1 %exitcond15, label %loop3, label %loop

loop3:                                            ; preds = %loop, %loop3
   %indvar = phi i64 [ %indvar.next, %loop3 ], [ 0, %loop ] ; <i64> 
[#uses=3]
   %gep6 = getelementptr [16 x float]* %0, i64 0, i64 %indvar ; <float*> 
[#uses=1]
   %gep8 = getelementptr [16 x float]* %arg, i64 0, i64 %indvar ; 
<float*> [#uses=1]
   %load7 = load float* %gep6                      ; <float> [#uses=1]
   store float %load7, float* %gep8
   %indvar.next = add i64 %indvar, 1               ; <i64> [#uses=2]
   %exitcond = icmp eq i64 %indvar.next, 16        ; <i1> [#uses=1]
   br i1 %exitcond, label %end4, label %loop3

end4:                                             ; preds = %loop3
   ret void
}

Andrew Clinton

2010-Sep-08 16:34 UTC

head link

[LLVMdev] loop fusion

I did find this note from 2004 that references a LoopFusion pass, but I 
can't find it in the source code:
http://nondot.org/sabre/LLVMNotes/LoopOptimizerNotes.txt

A little background - I'm working on a SIMD runtime, where I have a 
scalar program but need to execute it on multiple independent data 
elements.  One approach is to generate a loop for each operation, then 
rely on the optimizer to merge the loops so that I get good locality.  
Does this sound like a reasonable approach in LLVM?  If you're familiar 
with CUDA, I'm after something quite similar where the program is scalar 
but the runtime is vectorized.  Any pointers in the right direction 
would be appreciated!

Andrew

On 09/07/2010 03:12 PM, Andrew Clinton wrote:> Is there a transformation in LLVM that will perform loop fusion?
> http://en.wikipedia.org/wiki/Loop_fusion
>
> I have the following program, in which I would like the 2 loops
> (iterating the same number of times) to be merged into 1, after which
> other nice optimizations such as mem2reg will apply:
>
> ; ModuleID = 'test'
>
> define void @vector([16 x float]* nocapture %arg, [16 x float]*
> nocapture %ret) {
> bb.nph12:
>     %0 = alloca [16 x float], align 4               ;<[16 x float]*>
> [#uses=2]
>     br label %loop
>
> loop:                                             ; preds = %loop,
> %bb.nph12
>     %indvar13 = phi i64 [ 0, %bb.nph12 ], [ %indvar.next14, %loop ] ;
> <i64>  [#uses=3]
>     %gep = getelementptr [16 x float]* %ret, i64 0, i64 %indvar13 ;
> <float*>  [#uses=1]
>     %gep1 = getelementptr [16 x float]* %0, i64 0, i64 %indvar13 ;
> <float*>  [#uses=1]
>     %load = load float* %gep                        ;<float> 
[#uses=1]
>     store float %load, float* %gep1
>     %indvar.next14 = add i64 %indvar13, 1           ;<i64>  [#uses=2]
>     %exitcond15 = icmp eq i64 %indvar.next14, 16    ;<i1>  [#uses=1]
>     br i1 %exitcond15, label %loop3, label %loop
>
> loop3:                                            ; preds = %loop, %loop3
>     %indvar = phi i64 [ %indvar.next, %loop3 ], [ 0, %loop ] ;<i64>
> [#uses=3]
>     %gep6 = getelementptr [16 x float]* %0, i64 0, i64 %indvar
;<float*>
> [#uses=1]
>     %gep8 = getelementptr [16 x float]* %arg, i64 0, i64 %indvar ;
> <float*>  [#uses=1]
>     %load7 = load float* %gep6                      ;<float> 
[#uses=1]
>     store float %load7, float* %gep8
>     %indvar.next = add i64 %indvar, 1               ;<i64>  [#uses=2]
>     %exitcond = icmp eq i64 %indvar.next, 16        ;<i1>  [#uses=1]
>     br i1 %exitcond, label %end4, label %loop3
>
> end4:                                             ; preds = %loop3
>     ret void
> }
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Devang Patel

2010-Sep-08 17:21 UTC

head link

[LLVMdev] loop fusion

Andrew,

There is not any transformation in LLVM that does loop fusion. I do not of
anyone who is working on this. If you're interested to work on it then
it'd be great!
-
Devang
On Sep 8, 2010, at 9:34 AM, Andrew Clinton wrote:
> I did find this note from 2004 that references a LoopFusion pass, but I 
> can't find it in the source code:
> http://nondot.org/sabre/LLVMNotes/LoopOptimizerNotes.txt
> 
> A little background - I'm working on a SIMD runtime, where I have a 
> scalar program but need to execute it on multiple independent data 
> elements.  One approach is to generate a loop for each operation, then 
> rely on the optimizer to merge the loops so that I get good locality.  
> Does this sound like a reasonable approach in LLVM?  If you're familiar
> with CUDA, I'm after something quite similar where the program is
scalar
> but the runtime is vectorized.  Any pointers in the right direction 
> would be appreciated!
> 
> Andrew
>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Sep 2010 - [LLVMdev] loop fusion

[LLVMdev] loop fusion

[LLVMdev] loop fusion

[LLVMdev] loop fusion

Reasonably Related Threads