Hi Krzysztof, Yes, however this can be solved in one of two ways: 1) Fully inline the call graph for all leaf functions that call the barrier intrinsic. This is done on several implementations as standard already, and "no call stack" is a requirement for Karrenberg's algorithm at least. 2) Apply the "noclone" attribute transitively such that if a function may transitively call the barrier intrinsic, it is marked "noclone". Either of these methods allow the user to stop LLVM "breaking their IR. I'm aware that the general case with no user help (such as force-inlining, or otherwise controlling function cloning) is a very difficult problem. My intention is that there are no corner cases *with user assistance*. Currently there is no way to stop stuff breaking *even with* user assistance! :) Cheers, James ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf Of Krzysztof Parzyszek [kparzysz at codeaurora.org] Sent: 01 December 2012 16:22 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [RFC] "noclone" function attribute On 12/1/2012 10:02 AM, James Molloy wrote:> > This means that cloning whole functions (CloneFunction and CloneFunctionInto) will still work [...].Unfortunately, it won't work. Assume all threads call foo: foo() { ... bar(i) ... } bar(int i) { ... barrier(); ... } Now, suppose that we have discovered that bar(0) can be greatly optimized and generate a call to the specialized version, bar_0: foo() { ... if (i == 0) bar_0(); else bar(i); ... } And now we have multiple threads that no longer have a common barrier. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 12/1/2012 10:36 AM, James Molloy wrote:> > Either of these methods allow the user to stop LLVM "breaking their IR. I'm aware that the general case with no user help (such as force-inlining, or otherwise controlling function cloning) is a very difficult problem. My intention is that there are no corner cases *with user assistance*. Currently there is no way to stop stuff breaking *even with* user assistance! :)I'm not against the idea, I was just pointing out that cloning of functions can create problems. Here's another thing. Imagine this code: if (x > 0) { barrier(); ... } else { barrier(); ... } Even though "barrier" may have side-effects, normally, it would be possible to pull such a call out of the if-the-else statements. This cannot happen with barriers, so the attribute would also mean "don't-collapse" (as in "don't collapse multiple calls into one). Here's another idea: internally translate calls to "barrier" without arguments into calls to "barrier" with an argument that uniquely identifies the call. The users wouldn't see it in their sources, and the compiler/runtime could handle it in its own way. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Hi,> Here's another thing. Imagine this code: > > if (x > 0) { > barrier(); > ... > } else { > barrier(); > ... > }That is actually fine. The spec is broken when you *split* a barrier call into two, not when you fuse two together. The spec says that all workitems must hit the same sequence of barriers - you can break that if you increase the number of barriers, but never if you fuse two together.> Here's another idea: internally translate calls to "barrier" without > arguments into calls to "barrier" with an argument that uniquely > identifies the call. The users wouldn't see it in their sources, and > the compiler/runtime could handle it in its own way.Yep, I considered this. Problem is, "in its own way" for a CL GPU or CPU implementation following Karrenberg's model would be undoing the optimization that cloned the barrier. Either that or adding a call to another, "fused barrier" function which would be horribly messy and impossible on several GPU architectures (and, again, karrenberg's CPU model). Undoing these optimizations is a near-impossible problem. Inhibiting them to start with is easier. Thanks for finding these issues/corner cases - I'm certain there may be one or two that I've missed and can't explain away... Cheers, James ________________________________________ From: Krzysztof Parzyszek [kparzysz at codeaurora.org] Sent: 01 December 2012 16:51 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [RFC] "noclone" function attribute On 12/1/2012 10:36 AM, James Molloy wrote:> > Either of these methods allow the user to stop LLVM "breaking their IR. I'm aware that the general case with no user help (such as force-inlining, or otherwise controlling function cloning) is a very difficult problem. My intention is that there are no corner cases *with user assistance*. Currently there is no way to stop stuff breaking *even with* user assistance! :)I'm not against the idea, I was just pointing out that cloning of functions can create problems. Here's another thing. Imagine this code: if (x > 0) { barrier(); ... } else { barrier(); ... } Even though "barrier" may have side-effects, normally, it would be possible to pull such a call out of the if-the-else statements. This cannot happen with barriers, so the attribute would also mean "don't-collapse" (as in "don't collapse multiple calls into one). Here's another idea: internally translate calls to "barrier" without arguments into calls to "barrier" with an argument that uniquely identifies the call. The users wouldn't see it in their sources, and the compiler/runtime could handle it in its own way. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 12/1/2012 10:51 AM, Krzysztof Parzyszek wrote:> > Here's another thing. Imagine this code: > > if (x > 0) { > barrier(); > ... > } else { > barrier(); > ... > } > > Even though "barrier" may have side-effects, normally, it would be > possible to pull such a call out of the if-the-else statements. This > cannot happen with barriers, so the attribute would also mean > "don't-collapse" (as in "don't collapse multiple calls into one).On the second thought, this "commoning out" of barriers will most likely be ok. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation