thr3ads.net - llvm dev - [LLVMdev] scalarrepl fails to promote array of vector [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Fan Dawei

2012-Mar-12 03:35 UTC

[LLVMdev] scalarrepl fails to promote array of vector

Hi Chris,

Thanks for your reply.

You said that scalarRepl gets shy about loads and stores of the entire
aggregate. Then I use a test case:

; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
  %stackArray = alloca <4 x i32>
  %XC = bitcast i32* %X to <4 x i32>*
  %arrayVal = load <4 x i32>* %XC
  store <4 x i32> %arrayVal, <4 x i32>* %stackArray
  %arrayVal1 = load <4 x i32>* %stackArray
  %1 = extractelement <4 x i32> %arrayVal1, i32 1
  ret i32 %1
}

$ opt -S -stats -scalarrepl test1.ll
; ModuleID = 'test1.ll'

define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
  %XC = bitcast i32* %X to <4 x i32>*
  %arrayVal = load <4 x i32>* %XC
  %1 = extractelement <4 x i32> %arrayVal, i32 1
  ret i32 %1
}
===-------------------------------------------------------------------------==  
... Statistics Collected ...
===-------------------------------------------------------------------------==
1 mem2reg    - Number of alloca's promoted with a single store
1 scalarrepl - Number of allocas promoted

You can see that the stackArray is eliminated, although there is loads and
stores of the entire aggregate.

However, the optimised code is still not optimal. I want the code just load
one element from X instead of the whole array.

Thanks,
David





On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com>
wrote:
>
> On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
> > Hi all,
> >
> > I want to use scalarrepl pass to eliminate the allocation of mat_alloc
> which is of type [4 x <4 x float>] in the following program.
> >
> > $cat test.ll
> >
> > ; ModuleID = 'test.ll'
> >
> > define void @main(<4 x float>* %inArg, <4 x float>*
%outArg, [4 x <4 x
> float>]* %constants) nounwind {
> > entry:
> >   %inArg1 = load <4 x float>* %inArg
> >   %mat_alloc = alloca [4 x <4 x float>]
> >   %matVal = load [4 x <4 x float>]* %constants
> >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
> >   %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 0
> >   %1 = load <4 x float>* %0
> >   %2 = fmul <4 x float> %1, %inArg1
> >   %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 1
> >   %4 = load <4 x float>* %3
> >   %5 = fmul <4 x float> %4, %inArg1
> >   %6 = fadd <4 x float> %2, %5
> >   %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32
0, i32 2
> >   %8 = load <4 x float>* %7
> >   %9 = fmul <4 x float> %8, %inArg1
> >   %10 = fadd <4 x float> %6, %9
> >   %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc,
i32 0, i32
> 3
> >   %12 = load <4 x float>* %11
> >   %13 = fadd <4 x float> %10, %12
> >   %14 = getelementptr <4 x float>* %outArg, i32 1
> >   store <4 x float> %13, <4 x float>* %14
> >   ret void
> > }
> >
> > $ opt -S -stats -scalarrepl test.ll
> >
> > No transformation is performed. I've examined the source code of
> scalarrepl. It seems this pass does not handle array allocations. Is there
> other transformation pass I can use to eliminate this allocation?
>
> Hi David,
>
> ScalarRepl gets shy about loads and stores of the entire aggregate:
>
> >   %matVal = load [4 x <4 x float>]* %constants
> >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
>
> It is possible to generalize scalarrepl to handle these similar to the way
> it handles memcpy, but noone has done that yet.  Also, it's not
generally
> recommended to do stuff like this, because you'll get inefficient code
from
> many parts of the optimizer and code generator.
>
> -Chris
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120312/eed446ef/attachment.html>

Duncan Sands

2012-Mar-12 08:20 UTC

head link

[LLVMdev] scalarrepl fails to promote array of vector

Hi Fan,
> You said that scalarRepl gets shy about loads and stores of the entire
> aggregate. Then I use a test case:
>
> ; ModuleID = 'test1.ll'
> define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
>    %stackArray = alloca <4 x i32>
>    %XC = bitcast i32* %X to <4 x i32>*
>    %arrayVal = load <4 x i32>* %XC
>    store <4 x i32> %arrayVal, <4 x i32>* %stackArray
>    %arrayVal1 = load <4 x i32>* %stackArray
>    %1 = extractelement <4 x i32> %arrayVal1, i32 1
>    ret i32 %1
> }
>
> $ opt -S -stats -scalarrepl test1.ll
> ; ModuleID = 'test1.ll'
>
> define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
>    %XC = bitcast i32* %X to <4 x i32>*
>    %arrayVal = load <4 x i32>* %XC
>    %1 = extractelement <4 x i32> %arrayVal, i32 1
>    ret i32 %1
> }
>
===-------------------------------------------------------------------------==>
... Statistics Collected ...
>
===-------------------------------------------------------------------------==>
> 1 mem2reg    - Number of alloca's promoted with a single store
> 1 scalarrepl - Number of allocas promoted
>
> You can see that the stackArray is eliminated,
I think you may be confusing arrays and vectors: there is no stack array in
your example, only the vector <4 x i32>.  As a general rule hardly any
optimization is done for loads and stores of arrays because front-ends don't
produce them much.  Much more effort is made for vectors because they can be
important for getting good performance.

Ciao, Duncan.

  although there is loads and> stores of the entire aggregate.
>
> However, the optimised code is still not optimal. I want the code just load
one
> element from X instead of the whole array.
>
> Thanks,
> David
>
>
>
>
>
> On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com
> <mailto:clattner at apple.com>> wrote:
>
>
>     On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
>      > Hi all,
>      >
>      > I want to use scalarrepl pass to eliminate the allocation of
mat_alloc
>     which is of type [4 x <4 x float>] in the following program.
>      >
>      > $cat test.ll
>      >
>      > ; ModuleID = 'test.ll'
>      >
>      > define void @main(<4 x float>* %inArg, <4 x float>*
%outArg, [4 x <4 x
>     float>]* %constants) nounwind {
>      > entry:
>      >   %inArg1 = load <4 x float>* %inArg
>      >   %mat_alloc = alloca [4 x <4 x float>]
>      >   %matVal = load [4 x <4 x float>]* %constants
>      >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
>      >   %0 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32 0, i32 0
>      >   %1 = load <4 x float>* %0
>      >   %2 = fmul <4 x float> %1, %inArg1
>      >   %3 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32 0, i32 1
>      >   %4 = load <4 x float>* %3
>      >   %5 = fmul <4 x float> %4, %inArg1
>      >   %6 = fadd <4 x float> %2, %5
>      >   %7 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32 0, i32 2
>      >   %8 = load <4 x float>* %7
>      >   %9 = fmul <4 x float> %8, %inArg1
>      >   %10 = fadd <4 x float> %6, %9
>      >   %11 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32 0, i32 3
>      >   %12 = load <4 x float>* %11
>      >   %13 = fadd <4 x float> %10, %12
>      >   %14 = getelementptr <4 x float>* %outArg, i32 1
>      >   store <4 x float> %13, <4 x float>* %14
>      >   ret void
>      > }
>      >
>      > $ opt -S -stats -scalarrepl test.ll
>      >
>      > No transformation is performed. I've examined the source code
of
>     scalarrepl. It seems this pass does not handle array allocations. Is
there
>     other transformation pass I can use to eliminate this allocation?
>
>     Hi David,
>
>     ScalarRepl gets shy about loads and stores of the entire aggregate:
>
>      >   %matVal = load [4 x <4 x float>]* %constants
>      >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]*
%mat_alloc
>
>     It is possible to generalize scalarrepl to handle these similar to the
way
>     it handles memcpy, but noone has done that yet.  Also, it's not
generally
>     recommended to do stuff like this, because you'll get inefficient
code from
>     many parts of the optimizer and code generator.
>
>     -Chris
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Fan Dawei

2012-Mar-12 16:25 UTC

head link

[LLVMdev] scalarrepl fails to promote array of vector

Thanks Duncan and Chris!

I have this problem solved after I add the target layout definition at the
beginning of the ii source code. It seems that the optimization pass rely
on this information during transformation. I'll figure it out. All the
allocations including the array of vector in the previous examples
are eliminated.

Now my compiler can generate pretty neat and efficient code. Thanks!

Cheers!
David

On Mon, Mar 12, 2012 at 4:20 PM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Fan,
>
> > You said that scalarRepl gets shy about loads and stores of the entire
> > aggregate. Then I use a test case:
> >
> > ; ModuleID = 'test1.ll'
> > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
> >    %stackArray = alloca <4 x i32>
> >    %XC = bitcast i32* %X to <4 x i32>*
> >    %arrayVal = load <4 x i32>* %XC
> >    store <4 x i32> %arrayVal, <4 x i32>* %stackArray
> >    %arrayVal1 = load <4 x i32>* %stackArray
> >    %1 = extractelement <4 x i32> %arrayVal1, i32 1
> >    ret i32 %1
> > }
> >
> > $ opt -S -stats -scalarrepl test1.ll
> > ; ModuleID = 'test1.ll'
> >
> > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
> >    %XC = bitcast i32* %X to <4 x i32>*
> >    %arrayVal = load <4 x i32>* %XC
> >    %1 = extractelement <4 x i32> %arrayVal, i32 1
> >    ret i32 %1
> > }
> >
>
===-------------------------------------------------------------------------==>
>                            ... Statistics Collected ...
> >
>
===-------------------------------------------------------------------------==>
>
> > 1 mem2reg    - Number of alloca's promoted with a single store
> > 1 scalarrepl - Number of allocas promoted
> >
> > You can see that the stackArray is eliminated,
>
> I think you may be confusing arrays and vectors: there is no stack array in
> your example, only the vector <4 x i32>.  As a general rule hardly
any
> optimization is done for loads and stores of arrays because front-ends
> don't
> produce them much.  Much more effort is made for vectors because they can
> be
> important for getting good performance.
>
> Ciao, Duncan.
>
>  although there is loads and
> > stores of the entire aggregate.
> >
> > However, the optimised code is still not optimal. I want the code just
> load one
> > element from X instead of the whole array.
> >
> > Thanks,
> > David
> >
> >
> >
> >
> >
> > On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at
apple.com
> > <mailto:clattner at apple.com>> wrote:
> >
> >
> >     On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
> >
> >      > Hi all,
> >      >
> >      > I want to use scalarrepl pass to eliminate the allocation of
> mat_alloc
> >     which is of type [4 x <4 x float>] in the following program.
> >      >
> >      > $cat test.ll
> >      >
> >      > ; ModuleID = 'test.ll'
> >      >
> >      > define void @main(<4 x float>* %inArg, <4 x
float>* %outArg, [4 x
> <4 x
> >     float>]* %constants) nounwind {
> >      > entry:
> >      >   %inArg1 = load <4 x float>* %inArg
> >      >   %mat_alloc = alloca [4 x <4 x float>]
> >      >   %matVal = load [4 x <4 x float>]* %constants
> >      >   store [4 x <4 x float>] %matVal, [4 x <4 x
float>]* %mat_alloc
> >      >   %0 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 0
> >      >   %1 = load <4 x float>* %0
> >      >   %2 = fmul <4 x float> %1, %inArg1
> >      >   %3 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 1
> >      >   %4 = load <4 x float>* %3
> >      >   %5 = fmul <4 x float> %4, %inArg1
> >      >   %6 = fadd <4 x float> %2, %5
> >      >   %7 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 2
> >      >   %8 = load <4 x float>* %7
> >      >   %9 = fmul <4 x float> %8, %inArg1
> >      >   %10 = fadd <4 x float> %6, %9
> >      >   %11 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 3
> >      >   %12 = load <4 x float>* %11
> >      >   %13 = fadd <4 x float> %10, %12
> >      >   %14 = getelementptr <4 x float>* %outArg, i32 1
> >      >   store <4 x float> %13, <4 x float>* %14
> >      >   ret void
> >      > }
> >      >
> >      > $ opt -S -stats -scalarrepl test.ll
> >      >
> >      > No transformation is performed. I've examined the source
code of
> >     scalarrepl. It seems this pass does not handle array allocations.
Is
> there
> >     other transformation pass I can use to eliminate this allocation?
> >
> >     Hi David,
> >
> >     ScalarRepl gets shy about loads and stores of the entire
aggregate:
> >
> >      >   %matVal = load [4 x <4 x float>]* %constants
> >      >   store [4 x <4 x float>] %matVal, [4 x <4 x
float>]* %mat_alloc
> >
> >     It is possible to generalize scalarrepl to handle these similar to
> the way
> >     it handles memcpy, but noone has done that yet.  Also, it's
not
> generally
> >     recommended to do stuff like this, because you'll get
inefficient
> code from
> >     many parts of the optimizer and code generator.
> >
> >     -Chris
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120313/24eff74f/attachment.html>

Fan Dawei

2012-Mar-12 16:29 UTC

head link

[LLVMdev] scalarrepl fails to promote array of vector

Thanks Duncan!

I have this problem solved after I add the target layout definition at the
beginning of the ii source code. It seems that the optimization pass rely
on this information during transformation. All the allocations including
the array of vector in the previous examples are eliminated.

Now my compiler can generate pretty neat and efficient code. Thanks!

Cheers!
David

On Mon, Mar 12, 2012 at 4:20 PM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Fan,
>
> > You said that scalarRepl gets shy about loads and stores of the entire
> > aggregate. Then I use a test case:
> >
> > ; ModuleID = 'test1.ll'
> > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
> >    %stackArray = alloca <4 x i32>
> >    %XC = bitcast i32* %X to <4 x i32>*
> >    %arrayVal = load <4 x i32>* %XC
> >    store <4 x i32> %arrayVal, <4 x i32>* %stackArray
> >    %arrayVal1 = load <4 x i32>* %stackArray
> >    %1 = extractelement <4 x i32> %arrayVal1, i32 1
> >    ret i32 %1
> > }
> >
> > $ opt -S -stats -scalarrepl test1.ll
> > ; ModuleID = 'test1.ll'
> >
> > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
> >    %XC = bitcast i32* %X to <4 x i32>*
> >    %arrayVal = load <4 x i32>* %XC
> >    %1 = extractelement <4 x i32> %arrayVal, i32 1
> >    ret i32 %1
> > }
> >
>
===-------------------------------------------------------------------------==>
>                            ... Statistics Collected ...
> >
>
===-------------------------------------------------------------------------==>
>
> > 1 mem2reg    - Number of alloca's promoted with a single store
> > 1 scalarrepl - Number of allocas promoted
> >
> > You can see that the stackArray is eliminated,
>
> I think you may be confusing arrays and vectors: there is no stack array in
> your example, only the vector <4 x i32>.  As a general rule hardly
any
> optimization is done for loads and stores of arrays because front-ends
> don't
> produce them much.  Much more effort is made for vectors because they can
> be
> important for getting good performance.
>
> Ciao, Duncan.
>
>  although there is loads and
> > stores of the entire aggregate.
> >
> > However, the optimised code is still not optimal. I want the code just
> load one
> > element from X instead of the whole array.
> >
> > Thanks,
> > David
> >
> >
> >
> >
> >
> > On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at
apple.com
> > <mailto:clattner at apple.com>> wrote:
> >
> >
> >     On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
> >
> >      > Hi all,
> >      >
> >      > I want to use scalarrepl pass to eliminate the allocation of
> mat_alloc
> >     which is of type [4 x <4 x float>] in the following program.
> >      >
> >      > $cat test.ll
> >      >
> >      > ; ModuleID = 'test.ll'
> >      >
> >      > define void @main(<4 x float>* %inArg, <4 x
float>* %outArg, [4 x
> <4 x
> >     float>]* %constants) nounwind {
> >      > entry:
> >      >   %inArg1 = load <4 x float>* %inArg
> >      >   %mat_alloc = alloca [4 x <4 x float>]
> >      >   %matVal = load [4 x <4 x float>]* %constants
> >      >   store [4 x <4 x float>] %matVal, [4 x <4 x
float>]* %mat_alloc
> >      >   %0 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 0
> >      >   %1 = load <4 x float>* %0
> >      >   %2 = fmul <4 x float> %1, %inArg1
> >      >   %3 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 1
> >      >   %4 = load <4 x float>* %3
> >      >   %5 = fmul <4 x float> %4, %inArg1
> >      >   %6 = fadd <4 x float> %2, %5
> >      >   %7 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 2
> >      >   %8 = load <4 x float>* %7
> >      >   %9 = fmul <4 x float> %8, %inArg1
> >      >   %10 = fadd <4 x float> %6, %9
> >      >   %11 = getelementptr inbounds [4 x <4 x float>]*
%mat_alloc, i32
> 0, i32 3
> >      >   %12 = load <4 x float>* %11
> >      >   %13 = fadd <4 x float> %10, %12
> >      >   %14 = getelementptr <4 x float>* %outArg, i32 1
> >      >   store <4 x float> %13, <4 x float>* %14
> >      >   ret void
> >      > }
> >      >
> >      > $ opt -S -stats -scalarrepl test.ll
> >      >
> >      > No transformation is performed. I've examined the source
code of
> >     scalarrepl. It seems this pass does not handle array allocations.
Is
> there
> >     other transformation pass I can use to eliminate this allocation?
> >
> >     Hi David,
> >
> >     ScalarRepl gets shy about loads and stores of the entire
aggregate:
> >
> >      >   %matVal = load [4 x <4 x float>]* %constants
> >      >   store [4 x <4 x float>] %matVal, [4 x <4 x
float>]* %mat_alloc
> >
> >     It is possible to generalize scalarrepl to handle these similar to
> the way
> >     it handles memcpy, but noone has done that yet.  Also, it's
not
> generally
> >     recommended to do stuff like this, because you'll get
inefficient
> code from
> >     many parts of the optimizer and code generator.
> >
> >     -Chris
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120313/99058af4/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2012 - [LLVMdev] scalarrepl fails to promote array of vector

[LLVMdev] scalarrepl fails to promote array of vector

[LLVMdev] scalarrepl fails to promote array of vector

[LLVMdev] scalarrepl fails to promote array of vector

[LLVMdev] scalarrepl fails to promote array of vector

Maybe Matching Threads