Hi Chris,
Thanks for your reply.
You said that scalarRepl gets shy about loads and stores of the entire
aggregate. Then I use a test case:
; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
%stackArray = alloca <4 x i32>
%XC = bitcast i32* %X to <4 x i32>*
%arrayVal = load <4 x i32>* %XC
store <4 x i32> %arrayVal, <4 x i32>* %stackArray
%arrayVal1 = load <4 x i32>* %stackArray
%1 = extractelement <4 x i32> %arrayVal1, i32 1
ret i32 %1
}
$ opt -S -stats -scalarrepl test1.ll
; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
%XC = bitcast i32* %X to <4 x i32>*
%arrayVal = load <4 x i32>* %XC
%1 = extractelement <4 x i32> %arrayVal, i32 1
ret i32 %1
}
===-------------------------------------------------------------------------==
... Statistics Collected ...
===-------------------------------------------------------------------------==
1 mem2reg - Number of alloca's promoted with a single store
1 scalarrepl - Number of allocas promoted
You can see that the stackArray is eliminated, although there is loads and
stores of the entire aggregate.
However, the optimised code is still not optimal. I want the code just load
one element from X instead of the whole array.
Thanks,
David
On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com>
wrote:
>
> On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
> > Hi all,
> >
> > I want to use scalarrepl pass to eliminate the allocation of mat_alloc
> which is of type [4 x <4 x float>] in the following program.
> >
> > $cat test.ll
> >
> > ; ModuleID = 'test.ll'
> >
> > define void @main(<4 x float>* %inArg, <4 x float>*
%outArg, [4 x <4 x
> float>]* %constants) nounwind {
> > entry:
> > %inArg1 = load <4 x float>* %inArg
> > %mat_alloc = alloca [4 x <4 x float>]
> > %matVal = load [4 x <4 x float>]* %constants
> > store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
> > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 0
> > %1 = load <4 x float>* %0
> > %2 = fmul <4 x float> %1, %inArg1
> > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 1
> > %4 = load <4 x float>* %3
> > %5 = fmul <4 x float> %4, %inArg1
> > %6 = fadd <4 x float> %2, %5
> > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 2
> > %8 = load <4 x float>* %7
> > %9 = fmul <4 x float> %8, %inArg1
> > %10 = fadd <4 x float> %6, %9
> > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc,
i32 0, i32
> 3
> > %12 = load <4 x float>* %11
> > %13 = fadd <4 x float> %10, %12
> > %14 = getelementptr <4 x float>* %outArg, i32 1
> > store <4 x float> %13, <4 x float>* %14
> > ret void
> > }
> >
> > $ opt -S -stats -scalarrepl test.ll
> >
> > No transformation is performed. I've examined the source code of
> scalarrepl. It seems this pass does not handle array allocations. Is there
> other transformation pass I can use to eliminate this allocation?
>
> Hi David,
>
> ScalarRepl gets shy about loads and stores of the entire aggregate:
>
> > %matVal = load [4 x <4 x float>]* %constants
> > store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
>
> It is possible to generalize scalarrepl to handle these similar to the way
> it handles memcpy, but noone has done that yet. Also, it's not
generally
> recommended to do stuff like this, because you'll get inefficient code
from
> many parts of the optimizer and code generator.
>
> -Chris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120312/eed446ef/attachment.html>