thr3ads.net - llvm dev - [LLVMdev] What opt pass attempts implements this optimization? [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Villmow, Micah

2009-Oct-06 23:28 UTC

[LLVMdev] What opt pass attempts implements this optimization?

I have a very simple kernel that is generating very very bad code.

 

The basic kernel pseudo-code is as follows:

forloop(1 to n) {

  forloop(0 to j) {

  A

}

B

}

C

 

It is generating very ugly and inefficient code for a vector system
similar to the following pseudo-code:

if (n > 1) {

  if (j) {

                forloop(1 to n) {

                                forloop(0 to j) {

                                A

                                }

                                B

                }

               C

   } else {

                forloop(1 to n) {

                                B

                }

                C

   } 

} else {

C

}

 

I can understand how this would be good in a scalar system like x86, but
this is just bad on a vector system.

 

The reason this is bad because if a single branch is taken by a
work-item in a hardware thread(there are 64 work-items per hw thread),
then every single work-item in a hardware thread must execute that
branch. In this specific example, instead of every thread executing A, B
and C once, in the worst case(which is also happens 100% of the time),
every thread will execute C three times, B twice and A once. This also
does not take into account the cost of managing flow control on the
hardware, which is relatively expensive. This gets worse with the more
flow control I add in.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091006/4b790bcf/attachment.html>

Dan Gohman

2009-Oct-07 04:35 UTC

head link

[LLVMdev] What opt pass attempts implements this optimization?

On Oct 6, 2009, at 4:28 PM, Villmow, Micah wrote:
> I have a very simple kernel that is generating very very bad code.
>
> The basic kernel pseudo-code is as follows:
> forloop(1 to n) {
>   forloop(0 to j) {
>   A
> }
> B
> }
> C
>
> It is generating very ugly and inefficient code for a vector system  
> similar to the following pseudo-code:
> if (n > 1) {
>   if (j) {
>                 forloop(1 to n) {
>                                 forloop(0 to j) {
>                                 A
>                                 }
>                                 B
>                 }
>                C
>    } else {
>                 forloop(1 to n) {
>                                 B
>                 }
>                 C
>    }
> } else {
> C
> }
It's possible that there's something going on with loop unswitching  
here. Do you
have a testcase which demonstrates this?

Dan

Villmow, Micah

2009-Oct-07 18:02 UTC

head link

[LLVMdev] What opt pass attempts implements this optimization?

Dan,
 After I sent out this email a co-worker pointed me to LoopUnswitch
optimization as being a possible culprit. We set it to zero and it no
longer generates all this code. I've attached optimized bc and
unoptimized bc for this specific example.

Micah
> -----Original Message-----
> From: Dan Gohman [mailto:gohman at apple.com]
> Sent: Tuesday, October 06, 2009 9:35 PM
> To: Villmow, Micah
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] What opt pass attempts implements this
> optimization?
> 
> 
> On Oct 6, 2009, at 4:28 PM, Villmow, Micah wrote:
> 
> > I have a very simple kernel that is generating very very bad code.
> >
> > The basic kernel pseudo-code is as follows:
> > forloop(1 to n) {
> >   forloop(0 to j) {
> >   A
> > }
> > B
> > }
> > C
> >
> > It is generating very ugly and inefficient code for a vector system
> > similar to the following pseudo-code:
> > if (n > 1) {
> >   if (j) {
> >                 forloop(1 to n) {
> >                                 forloop(0 to j) {
> >                                 A
> >                                 }
> >                                 B
> >                 }
> >                C
> >    } else {
> >                 forloop(1 to n) {
> >                                 B
> >                 }
> >                 C
> >    }
> > } else {
> > C
> > }
> 
> It's possible that there's something going on with loop unswitching
> here. Do you
> have a testcase which demonstrates this?
> 
> Dan
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AESEncryptDecrypt-opt.bc
Type: application/octet-stream
Size: 26732 bytes
Desc: AESEncryptDecrypt-opt.bc
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/15ba61af/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AESEncryptDecrypt.bc
Type: application/octet-stream
Size: 7416 bytes
Desc: AESEncryptDecrypt.bc
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/15ba61af/attachment-0001.obj>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Oct 2009 - [LLVMdev] What opt pass attempts implements this optimization?

[LLVMdev] What opt pass attempts implements this optimization?

[LLVMdev] What opt pass attempts implements this optimization?

[LLVMdev] What opt pass attempts implements this optimization?

Apparently Analagous Threads