Is inlining (which duplicates code) of functions containing OpenCL style barriers legal?or e.g. if you had some changed phase ordering where you had if (cond) { S1; } call user_func() // user_func has a barrier buried inside it. you do tail splitting if (cond) { S1; call user_func() } else { call user_func(); } now you inline -- oops now you might have a problem so do you want IPA to propagate the barrier bit to the call sites? you could do inlining before tail splitting sounds messy... Vinod On Thu, Oct 8, 2009 at 8:38 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Thu, Oct 8, 2009 at 2:11 PM, Reid Kleckner <rnk at mit.edu> wrote: > > IMO Jeff's solution is the cleanest, simplest way to get code that > > works. Just generate a separate function for every barrier in the > > program, and mark it noinline. This way the instruction pointers will > > be unique to the barrier. > > No, this gets rather nasty: to support an instruction like this, it > isn't legal to duplicate calls to functions containing a barrier > instruction. > > Another proposal: add an executebarrier function attribute for > functions which directly or indirectly contain an execution barrier, > and adjust all the relevant transformation passes, like jump threading > and loop unswitching, to avoid duplicating calls to such functions. > This puts a slight burden on the frontend to mark functions > appropriately, but I don't see any other solution which doesn't affect > code which doesn't use execute barriers. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091008/a5a3c08a/attachment.html>
The requirement in OpenCL is that all threads (work-items) are required to hit the same barrier. If one does what you have shown below, it is not legal because some threads may go through the block with S1 and some other threads will go the other way. On some hardware, such a program will cause a hardware stall. If one is inlining, it is preferable to inline early assuming the rest of the transformations don't mess with the barrier. Eli is correct that you can't duplicate calls to a function containing these kind of barriers for the same reasons. From the discussions so far, it would be nice if such a concept where you don't want to modify the control flow of a basic block containing such an execution barrier or a function containing such a barrier. This requires that all phases that does such optimizations would have to be made aware of it. Such a concept may be also useful for other things like inline assembly where one may not want to duplicate a block. -- Mon Ping On Oct 8, 2009, at 11:17 PM, Vinod Grover wrote:> Is inlining (which duplicates code) of functions containing OpenCL > style barriers legal? > or e.g. > > if you had some changed phase ordering where you had > > if (cond) { > S1; > } > call user_func() // user_func has a barrier buried inside it. > > you do tail splitting > > if (cond) { > S1; > call user_func() > } else { > call user_func(); > } > > now you inline -- oops now you might have a problem > > so do you want IPA to propagate the barrier bit to the call sites? > > you could do inlining before tail splitting > > sounds messy... > > Vinod > > > On Thu, Oct 8, 2009 at 8:38 PM, Eli Friedman > <eli.friedman at gmail.com> wrote: > On Thu, Oct 8, 2009 at 2:11 PM, Reid Kleckner <rnk at mit.edu> wrote: > > IMO Jeff's solution is the cleanest, simplest way to get code that > > works. Just generate a separate function for every barrier in the > > program, and mark it noinline. This way the instruction pointers > will > > be unique to the barrier. > > No, this gets rather nasty: to support an instruction like this, it > isn't legal to duplicate calls to functions containing a barrier > instruction. > > Another proposal: add an executebarrier function attribute for > functions which directly or indirectly contain an execution barrier, > and adjust all the relevant transformation passes, like jump threading > and loop unswitching, to avoid duplicating calls to such functions. > This puts a slight burden on the frontend to mark functions > appropriately, but I don't see any other solution which doesn't affect > code which doesn't use execute barriers. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091009/041cdd88/attachment.html>
On Fri, Oct 9, 2009 at 12:20 AM, Mon Ping Wang <monping at apple.com> wrote:> > The requirement in OpenCL is that all threads (work-items) are required to > hit the same barrier. If one does what you have shown below, it is not > legal because some threads may go through the block with S1 and some other > threads will go the other way. On some hardware, such a program will cause > a hardware stall. If one is inlining, it is preferable to inline early > assuming the rest of the transformations don't mess with the barrier. Eli is > correct that you can't duplicate calls to a function containing these kind > of barriers for the same reasons. From the discussions so far, it would be > nice if such a concept where you don't want to modify the control flow of a > basic block containing such an execution barrier or a function containing > such a barrier. This requires that all phases that does such optimizations > would have to be made aware of it. Such a concept may be also useful for > other things like inline assembly where one may not want to duplicate a > block.It's probably worth noting that I wasn't proposing a general prohibition of duplication; it would be okay for inlining or loop unrolling to duplicate a call to a function marked executebarrier. It's not the same sort of prohibition that one might want for inline assembly. -Eli
Vinod, Depends on your reading of the spec. It states that if a work-item goes down a conditional path then all work-items in a work-group must also go down the conditional path. So in my interpretation, the call to user_func() in the true branch produces a different barrier during execution than the call to user_func() in the false branch, even though they both exist on the same line of source. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Vinod Grover Sent: Thursday, October 08, 2009 11:18 PM To: Eli Friedman Cc: Reid Kleckner; LLVM Developers Mailing List Subject: Re: [LLVMdev] Instructions that cannot be duplicated Is inlining (which duplicates code) of functions containing OpenCL style barriers legal? or e.g. if you had some changed phase ordering where you had if (cond) { S1; } call user_func() // user_func has a barrier buried inside it. you do tail splitting if (cond) { S1; call user_func() } else { call user_func(); } now you inline -- oops now you might have a problem so do you want IPA to propagate the barrier bit to the call sites? you could do inlining before tail splitting sounds messy... Vinod On Thu, Oct 8, 2009 at 8:38 PM, Eli Friedman <eli.friedman at gmail.com> wrote: On Thu, Oct 8, 2009 at 2:11 PM, Reid Kleckner <rnk at mit.edu> wrote:> IMO Jeff's solution is the cleanest, simplest way to get code that > works. Just generate a separate function for every barrier in the > program, and mark it noinline. This way the instruction pointers will > be unique to the barrier.No, this gets rather nasty: to support an instruction like this, it isn't legal to duplicate calls to functions containing a barrier instruction. Another proposal: add an executebarrier function attribute for functions which directly or indirectly contain an execution barrier, and adjust all the relevant transformation passes, like jump threading and loop unswitching, to avoid duplicating calls to such functions. This puts a slight burden on the frontend to mark functions appropriately, but I don't see any other solution which doesn't affect code which doesn't use execute barriers. -Eli _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091009/1e158490/attachment.html>