Villmow, Micah
2009-Oct-06 23:28 UTC
[LLVMdev] What opt pass attempts implements this optimization?
I have a very simple kernel that is generating very very bad code. The basic kernel pseudo-code is as follows: forloop(1 to n) { forloop(0 to j) { A } B } C It is generating very ugly and inefficient code for a vector system similar to the following pseudo-code: if (n > 1) { if (j) { forloop(1 to n) { forloop(0 to j) { A } B } C } else { forloop(1 to n) { B } C } } else { C } I can understand how this would be good in a scalar system like x86, but this is just bad on a vector system. The reason this is bad because if a single branch is taken by a work-item in a hardware thread(there are 64 work-items per hw thread), then every single work-item in a hardware thread must execute that branch. In this specific example, instead of every thread executing A, B and C once, in the worst case(which is also happens 100% of the time), every thread will execute C three times, B twice and A once. This also does not take into account the cost of managing flow control on the hardware, which is relatively expensive. This gets worse with the more flow control I add in. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091006/4b790bcf/attachment.html>
Dan Gohman
2009-Oct-07 04:35 UTC
[LLVMdev] What opt pass attempts implements this optimization?
On Oct 6, 2009, at 4:28 PM, Villmow, Micah wrote:> I have a very simple kernel that is generating very very bad code. > > The basic kernel pseudo-code is as follows: > forloop(1 to n) { > forloop(0 to j) { > A > } > B > } > C > > It is generating very ugly and inefficient code for a vector system > similar to the following pseudo-code: > if (n > 1) { > if (j) { > forloop(1 to n) { > forloop(0 to j) { > A > } > B > } > C > } else { > forloop(1 to n) { > B > } > C > } > } else { > C > }It's possible that there's something going on with loop unswitching here. Do you have a testcase which demonstrates this? Dan
Villmow, Micah
2009-Oct-07 18:02 UTC
[LLVMdev] What opt pass attempts implements this optimization?
Dan, After I sent out this email a co-worker pointed me to LoopUnswitch optimization as being a possible culprit. We set it to zero and it no longer generates all this code. I've attached optimized bc and unoptimized bc for this specific example. Micah> -----Original Message----- > From: Dan Gohman [mailto:gohman at apple.com] > Sent: Tuesday, October 06, 2009 9:35 PM > To: Villmow, Micah > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] What opt pass attempts implements this > optimization? > > > On Oct 6, 2009, at 4:28 PM, Villmow, Micah wrote: > > > I have a very simple kernel that is generating very very bad code. > > > > The basic kernel pseudo-code is as follows: > > forloop(1 to n) { > > forloop(0 to j) { > > A > > } > > B > > } > > C > > > > It is generating very ugly and inefficient code for a vector system > > similar to the following pseudo-code: > > if (n > 1) { > > if (j) { > > forloop(1 to n) { > > forloop(0 to j) { > > A > > } > > B > > } > > C > > } else { > > forloop(1 to n) { > > B > > } > > C > > } > > } else { > > C > > } > > It's possible that there's something going on with loop unswitching > here. Do you > have a testcase which demonstrates this? > > Dan >-------------- next part -------------- A non-text attachment was scrubbed... Name: AESEncryptDecrypt-opt.bc Type: application/octet-stream Size: 26732 bytes Desc: AESEncryptDecrypt-opt.bc URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/15ba61af/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: AESEncryptDecrypt.bc Type: application/octet-stream Size: 7416 bytes Desc: AESEncryptDecrypt.bc URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/15ba61af/attachment-0001.obj>