On Fri, Oct 9, 2009 at 12:20 AM, Mon Ping Wang <monping at apple.com> wrote:> > The requirement in OpenCL is that all threads (work-items) are required to > hit the same barrier. If one does what you have shown below, it is not > legal because some threads may go through the block with S1 and some other > threads will go the other way. On some hardware, such a program will cause > a hardware stall. If one is inlining, it is preferable to inline early > assuming the rest of the transformations don't mess with the barrier. Eli is > correct that you can't duplicate calls to a function containing these kind > of barriers for the same reasons. From the discussions so far, it would be > nice if such a concept where you don't want to modify the control flow of a > basic block containing such an execution barrier or a function containing > such a barrier. This requires that all phases that does such optimizations > would have to be made aware of it. Such a concept may be also useful for > other things like inline assembly where one may not want to duplicate a > block.It's probably worth noting that I wasn't proposing a general prohibition of duplication; it would be okay for inlining or loop unrolling to duplicate a call to a function marked executebarrier. It's not the same sort of prohibition that one might want for inline assembly. -Eli
Point taken :->. Inlining of these functions containing these barriers are required on some platforms. The only restriction is that any control flow optimization must preserve the property that all threads will hit the same barrier. -- Mon Ping On Oct 9, 2009, at 2:22 AM, Eli Friedman wrote:> On Fri, Oct 9, 2009 at 12:20 AM, Mon Ping Wang <monping at apple.com> > wrote: >> >> The requirement in OpenCL is that all threads (work-items) are >> required to >> hit the same barrier. If one does what you have shown below, it is >> not >> legal because some threads may go through the block with S1 and >> some other >> threads will go the other way. On some hardware, such a program >> will cause >> a hardware stall. If one is inlining, it is preferable to inline >> early >> assuming the rest of the transformations don't mess with the >> barrier. Eli is >> correct that you can't duplicate calls to a function containing >> these kind >> of barriers for the same reasons. From the discussions so far, it >> would be >> nice if such a concept where you don't want to modify the control >> flow of a >> basic block containing such an execution barrier or a function >> containing >> such a barrier. This requires that all phases that does such >> optimizations >> would have to be made aware of it. Such a concept may be also >> useful for >> other things like inline assembly where one may not want to >> duplicate a >> block. > > It's probably worth noting that I wasn't proposing a general > prohibition of duplication; it would be okay for inlining or loop > unrolling to duplicate a call to a function marked executebarrier. > It's not the same sort of prohibition that one might want for inline > assembly. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Are the platforms with no function calls the same ones that have optimization-hostile barrier instructions? If the two sets of platforms are disjoint, OpenCL implementers can use my or Devang's noinline-function technique on the optimization-hostile platforms, and inject a unique argument into the barrier() call in the frontend on the no-function platforms. On Fri, Oct 9, 2009 at 3:34 AM, Mon Ping Wang <monping at apple.com> wrote:> > Point taken :->. Inlining of these functions containing these > barriers are required on some platforms. The only restriction is that > any control flow optimization must preserve the property that all > threads will hit the same barrier. > > -- Mon Ping > > On Oct 9, 2009, at 2:22 AM, Eli Friedman wrote: > >> On Fri, Oct 9, 2009 at 12:20 AM, Mon Ping Wang <monping at apple.com> >> wrote: >>> >>> The requirement in OpenCL is that all threads (work-items) are >>> required to >>> hit the same barrier. If one does what you have shown below, it is >>> not >>> legal because some threads may go through the block with S1 and >>> some other >>> threads will go the other way. On some hardware, such a program >>> will cause >>> a hardware stall. If one is inlining, it is preferable to inline >>> early >>> assuming the rest of the transformations don't mess with the >>> barrier. Eli is >>> correct that you can't duplicate calls to a function containing >>> these kind >>> of barriers for the same reasons. From the discussions so far, it >>> would be >>> nice if such a concept where you don't want to modify the control >>> flow of a >>> basic block containing such an execution barrier or a function >>> containing >>> such a barrier. This requires that all phases that does such >>> optimizations >>> would have to be made aware of it. Such a concept may be also >>> useful for >>> other things like inline assembly where one may not want to >>> duplicate a >>> block. >> >> It's probably worth noting that I wasn't proposing a general >> prohibition of duplication; it would be okay for inlining or loop >> unrolling to duplicate a call to a function marked executebarrier. >> It's not the same sort of prohibition that one might want for inline >> assembly. >> >> -Eli >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >