Hi all,
I want to use scalarrepl pass to eliminate the allocation of mat_alloc
which is of type [4 x <4 x float>] in the following program.
$cat test.ll
; ModuleID = 'test.ll'
define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x
<4 x
float>]* %constants) nounwind {
entry:
%inArg1 = load <4 x float>* %inArg
%mat_alloc = alloca [4 x <4 x float>]
%matVal = load [4 x <4 x float>]* %constants
store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
%0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0
%1 = load <4 x float>* %0
%2 = fmul <4 x float> %1, %inArg1
%3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1
%4 = load <4 x float>* %3
%5 = fmul <4 x float> %4, %inArg1
%6 = fadd <4 x float> %2, %5
%7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2
%8 = load <4 x float>* %7
%9 = fmul <4 x float> %8, %inArg1
%10 = fadd <4 x float> %6, %9
%11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3
%12 = load <4 x float>* %11
%13 = fadd <4 x float> %10, %12
%14 = getelementptr <4 x float>* %outArg, i32 1
store <4 x float> %13, <4 x float>* %14
ret void
}
$ opt -S -stats -scalarrepl test.ll
No transformation is performed. I've examined the source code of
scalarrepl. It seems this pass does not handle array allocations. Is there
other transformation pass I can use to eliminate this allocation?
Thanks,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120310/655a4200/attachment.html>
On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:> Hi all, > > I want to use scalarrepl pass to eliminate the allocation of mat_alloc which is of type [4 x <4 x float>] in the following program. > > $cat test.ll > > ; ModuleID = 'test.ll' > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x float>]* %constants) nounwind { > entry: > %inArg1 = load <4 x float>* %inArg > %mat_alloc = alloca [4 x <4 x float>] > %matVal = load [4 x <4 x float>]* %constants > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0 > %1 = load <4 x float>* %0 > %2 = fmul <4 x float> %1, %inArg1 > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1 > %4 = load <4 x float>* %3 > %5 = fmul <4 x float> %4, %inArg1 > %6 = fadd <4 x float> %2, %5 > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2 > %8 = load <4 x float>* %7 > %9 = fmul <4 x float> %8, %inArg1 > %10 = fadd <4 x float> %6, %9 > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3 > %12 = load <4 x float>* %11 > %13 = fadd <4 x float> %10, %12 > %14 = getelementptr <4 x float>* %outArg, i32 1 > store <4 x float> %13, <4 x float>* %14 > ret void > } > > $ opt -S -stats -scalarrepl test.ll > > No transformation is performed. I've examined the source code of scalarrepl. It seems this pass does not handle array allocations. Is there other transformation pass I can use to eliminate this allocation?Hi David, ScalarRepl gets shy about loads and stores of the entire aggregate:> %matVal = load [4 x <4 x float>]* %constants > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_allocIt is possible to generalize scalarrepl to handle these similar to the way it handles memcpy, but noone has done that yet. Also, it's not generally recommended to do stuff like this, because you'll get inefficient code from many parts of the optimizer and code generator. -Chris
Hi Chris,
Thanks for your reply.
You said that scalarRepl gets shy about loads and stores of the entire
aggregate. Then I use a test case:
; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
%stackArray = alloca <4 x i32>
%XC = bitcast i32* %X to <4 x i32>*
%arrayVal = load <4 x i32>* %XC
store <4 x i32> %arrayVal, <4 x i32>* %stackArray
%arrayVal1 = load <4 x i32>* %stackArray
%1 = extractelement <4 x i32> %arrayVal1, i32 1
ret i32 %1
}
$ opt -S -stats -scalarrepl test1.ll
; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
%XC = bitcast i32* %X to <4 x i32>*
%arrayVal = load <4 x i32>* %XC
%1 = extractelement <4 x i32> %arrayVal, i32 1
ret i32 %1
}
===-------------------------------------------------------------------------==
... Statistics Collected ...
===-------------------------------------------------------------------------==
1 mem2reg - Number of alloca's promoted with a single store
1 scalarrepl - Number of allocas promoted
You can see that the stackArray is eliminated, although there is loads and
stores of the entire aggregate.
However, the optimised code is still not optimal. I want the code just load
one element from X instead of the whole array.
Thanks,
David
On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com>
wrote:
>
> On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
> > Hi all,
> >
> > I want to use scalarrepl pass to eliminate the allocation of mat_alloc
> which is of type [4 x <4 x float>] in the following program.
> >
> > $cat test.ll
> >
> > ; ModuleID = 'test.ll'
> >
> > define void @main(<4 x float>* %inArg, <4 x float>*
%outArg, [4 x <4 x
> float>]* %constants) nounwind {
> > entry:
> > %inArg1 = load <4 x float>* %inArg
> > %mat_alloc = alloca [4 x <4 x float>]
> > %matVal = load [4 x <4 x float>]* %constants
> > store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
> > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 0
> > %1 = load <4 x float>* %0
> > %2 = fmul <4 x float> %1, %inArg1
> > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 1
> > %4 = load <4 x float>* %3
> > %5 = fmul <4 x float> %4, %inArg1
> > %6 = fadd <4 x float> %2, %5
> > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 2
> > %8 = load <4 x float>* %7
> > %9 = fmul <4 x float> %8, %inArg1
> > %10 = fadd <4 x float> %6, %9
> > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc,
i32 0, i32
> 3
> > %12 = load <4 x float>* %11
> > %13 = fadd <4 x float> %10, %12
> > %14 = getelementptr <4 x float>* %outArg, i32 1
> > store <4 x float> %13, <4 x float>* %14
> > ret void
> > }
> >
> > $ opt -S -stats -scalarrepl test.ll
> >
> > No transformation is performed. I've examined the source code of
> scalarrepl. It seems this pass does not handle array allocations. Is there
> other transformation pass I can use to eliminate this allocation?
>
> Hi David,
>
> ScalarRepl gets shy about loads and stores of the entire aggregate:
>
> > %matVal = load [4 x <4 x float>]* %constants
> > store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
>
> It is possible to generalize scalarrepl to handle these similar to the way
> it handles memcpy, but noone has done that yet. Also, it's not
generally
> recommended to do stuff like this, because you'll get inefficient code
from
> many parts of the optimizer and code generator.
>
> -Chris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120312/eed446ef/attachment.html>