Connor Abbott via llvm-dev
2019-Jan-24 15:31 UTC
[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
I don't see how this would fix the continue vs. nested loop problem I explained earlier. That is, how would this prevent turning: for (...) { ballot(); if (... /* non-uniform */) continue; } into for (...) { do { ballot(); } while (... /* non-uniform */); } and vice versa? Note that there's no duplication going on here, and the single-threaded flow of control is exactly the same. Another reason this isn't so great is that it prevents e.g. CSE on ballots that actually should be combined, since you're modelling it as a write. It seems like the optimizer is going to need some special knowledge of convergent things that fake memory constraints can't give us. On Thu, Jan 24, 2019 at 4:06 PM Jan Sjodin <jan_sjodin at yahoo.com> wrote:> > > I was looking into ballot() and how if it is possible to keep a single-threaded > view of the code, but add some extra conditions that must hold after the > transformation. I had the initial idea that each call to ballot() in a > single-threaded program can be seen as a partial write to a memory > location, and each location memory location is unique for every call site, > plus there some externally observable side effect. We can abstract this > away by tagging the calls, e.g. by using aliases. > > For example: > > if (...) { > foo1 = ballot(); > } else { > foo2 = ballot(); > } > > simply becomes: > > if (...) { > foo1 = ballot_1(); > } else { > foo2 = ballot_2(); > } > > > and > > if (...) { > } else { > } > ballot(); > > becomes > > if (...) { > } else { > } > ballot_1(); > > In the first case it would prevent combining the two calls into one > after the if. In the second example there is generally nothing that > says it could not be transformed into the first example with two > calls to ballot_1(), which should not be allowed. > > Another form of duplication that we must allow are loop transforms, > like unrolling or peeling. These might seem similar to the example > above, since we are cloning code and with conditions etc. But > they are different since they calls are in different loop iterations. > > The condition that needs to be met is that: > > There must be a single path between all cloned ballot_n() functions. > > The reason for this condition is that if we clone the same call, then > the copies must be mutually exclusive, but if they are cloned from > a loop, there must be a path, or we would skip iterations. > > If we want to be more sophisticated we can add: > > If there is no such path, the calls must be separated by uniform branches. > > After the transform everything should be re-tagged, since we already > checked the calls and we don't want to check them again. Also, not all > transforms need (or should) have the tagging checked. One example is > inlining, where multiple copies are created, but they are clearly different > calls. The tagging can be done temporarily for a single pass, and then > eliminated. This could be a good tool for debugging as well, since it can > detect if a transform is suspect. > > The code would of course have to make sense as far as control flow. If > we have: > > for(;;) { > if(...) { > ballot(); > break; > } > } > > This would have to be translated to: > > for(;;) { > if(...) { > ballot(); > if(UnknownConstant) { // Not a uniform condition, but later on translated to "true" > break; > } > } > > However, I think that this is the way the code is generated today anyway. > There would have to be some attribute that indicate that these calls (or functions) > contain ballot or other cross-lane operations so they could be tagged and > checked. The attribute would be used by the passes to know that the special > conditions exist for those calls. > > As far as what it means to have a path, it could be complicated. > For example: > > x = ... > ballot_1(); > > could be transformed to: > > if (x < 4711) { > ballot_1(); > > if(x >= 4711) { > ballot_1(); > } > > So a simple path check would say there is a path, and the transform is legal, > but if we examine the conditions, there is no path, and the transform should not be legal. > It could be made even more obscure of course, but I don't see any optimizations really > doing this kind of thing, > > - Jan > > On Saturday, December 29, 2018, 11:32:25 AM EST, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 20.12.18 18:03, Connor Abbott wrote: > > We already have the notion of "convergent" functions like > > syncthreads(), to which we cannot add control-flow dependencies. > > That is, it's legal to hoist syncthreads out of an "if", but it's > > not legal to sink it into an "if". It's not clear to me why we > > can't have "anticonvergent" (terrible name) functions which cannot > > have control-flow dependencies removed from them? ballot() would be > > both convergent and anticonvergent. > > > > Would that solve your problem? > > > > > > I think it's important to note that we already have such an attribute, > > although with the opposite sense - it's impossible to remove control > > flow dependencies from a call unless you mark it as "speculatable". > > This isn't actually true. If both sides of an if/else have the same > non-speculative function call, it can still be moved out of control flow. > > That's because doing so doesn't change anything at all from a > single-threaded perspective. Hence why I think we should model the > communication between threads honestly. > > > > However, this doesn't prevent > > > > if (...) { > > } else { > > } > > foo = ballot(); > > > > from being turned into > > > > if (...) { > > foo1 = ballot(); > > } else { > > foo2 = ballot(); > > } > > foo = phi(foo1, foo2) > > > > and vice versa. We have a "noduplicate" attribute which prevents > > transforming the first into the second, but not the other way around. Of > > course we could keep going this way and add a "nocombine" attribute to > > complement noduplicate. But even then, there are even still problematic > > transforms. For example, take this program, which is simplified from a > > real game that doesn't work with the AMDGPU backend: > > > > while (cond1 /* uniform */) { > > ballot(); > > ... > > if (cond2 /* non-uniform */) continue; > > ... > > } > > > > In SPIR-V, when using structured control flow, the semantics of this are > > pretty clearly defined. In particular, there's a continue block after > > the body of the loop where control flow re-converges, and the only back > > edge is from the continue block, so the ballot is in uniform control > > flow. But LLVM will get rid of the continue block since it's empty, and > > re-analyze the loop as two nested loops, splitting the loop header in > > two, producing a CFG which corresponds to this: > > > > while (cond1 /* uniform */) { > > do { > > ballot(); > > ... > > } while (cond2 /* non-uniform */); > > ... > > } > > > > Now, in an implementation where control flow re-converges at the > > immediate post-dominator, this won't do the right thing anymore. In > > order to handle it correctly, you'd effectively need to always flatten > > nested loops, which will probably be really bad for performance if the > > programmer actually wanted the second thing. It also makes it impossible > > when translating a high-level language to LLVM to get the "natural" > > behavior which game developers actually expect. This is exactly the sort > > of "spooky action at a distance" which makes me think that everything > > we've done so far is really insufficient, and we need to add an explicit > > notion of control-flow divergence and reconvergence to the IR. We need a > > way to say that control flow re-converges at the continue block, so that > > LLVM won't eliminate it, and we can vectorize it correctly without > > penalizing cases where it's better for control flow not to re-converge. > > Well said! > > Cheers, > > Nicolai > -- > Lerne, wie die Welt wirklich ist, > Aber vergiss niemals, wie sie sein sollte. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Jan Sjodin via llvm-dev
2019-Jan-25 02:05 UTC
[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
> for (...) { > ballot(); > if (... /* non-uniform */) continue; > } > > into > > for (...) { > do { > ballot(); > } while (... /* non-uniform */); > }I'm not sure if I follow this example, could you and explain a bit more?It looks to me that the condition in the "if" must be false (if thesame condition is used in the while), or we would call ballot the wrong number of times. About the CSE, when would that be legal? I can imagine with uniformbranches that it could work, but would like to see an example tofully understand this. I agree that it would be more conservative than if we model the threading, but I'm not sure about the cost/benefit. I am mostly curious if it is possible to have a single-thread view or not. Then we would have tosee if it is adequate. - Jan On Thursday, January 24, 2019, 10:31:47 AM EST, Connor Abbott <cwabbott0 at gmail.com> wrote: I don't see how this would fix the continue vs. nested loop problem I explained earlier. That is, how would this prevent turning: for (...) { ballot(); if (... /* non-uniform */) continue; } into for (...) { do { ballot(); } while (... /* non-uniform */); } and vice versa? Note that there's no duplication going on here, and the single-threaded flow of control is exactly the same. Another reason this isn't so great is that it prevents e.g. CSE on ballots that actually should be combined, since you're modelling it as a write. It seems like the optimizer is going to need some special knowledge of convergent things that fake memory constraints can't give us. On Thu, Jan 24, 2019 at 4:06 PM Jan Sjodin <jan_sjodin at yahoo.com> wrote:> > > I was looking into ballot() and how if it is possible to keep a single-threaded > view of the code, but add some extra conditions that must hold after the > transformation. I had the initial idea that each call to ballot() in a > single-threaded program can be seen as a partial write to a memory > location, and each location memory location is unique for every call site, > plus there some externally observable side effect. We can abstract this > away by tagging the calls, e.g. by using aliases. > > For example: > > if (...) { > foo1 = ballot(); > } else { > foo2 = ballot(); > } > > simply becomes: > > if (...) { > foo1 = ballot_1(); > } else { > foo2 = ballot_2(); > } > > > and > > if (...) { > } else { > } > ballot(); > > becomes > > if (...) { > } else { > } > ballot_1(); > > In the first case it would prevent combining the two calls into one > after the if. In the second example there is generally nothing that > says it could not be transformed into the first example with two > calls to ballot_1(), which should not be allowed. > > Another form of duplication that we must allow are loop transforms, > like unrolling or peeling. These might seem similar to the example > above, since we are cloning code and with conditions etc. But > they are different since they calls are in different loop iterations. > > The condition that needs to be met is that: > > There must be a single path between all cloned ballot_n() functions. > > The reason for this condition is that if we clone the same call, then > the copies must be mutually exclusive, but if they are cloned from > a loop, there must be a path, or we would skip iterations. > > If we want to be more sophisticated we can add: > > If there is no such path, the calls must be separated by uniform branches. > > After the transform everything should be re-tagged, since we already > checked the calls and we don't want to check them again. Also, not all > transforms need (or should) have the tagging checked. One example is > inlining, where multiple copies are created, but they are clearly different > calls. The tagging can be done temporarily for a single pass, and then > eliminated. This could be a good tool for debugging as well, since it can > detect if a transform is suspect. > > The code would of course have to make sense as far as control flow. If > we have: > > for(;;) { > if(...) { > ballot(); > break; > } > } > > This would have to be translated to: > > for(;;) { > if(...) { > ballot(); > if(UnknownConstant) { // Not a uniform condition, but later on translated to "true" > break; > } > } > > However, I think that this is the way the code is generated today anyway. > There would have to be some attribute that indicate that these calls (or functions) > contain ballot or other cross-lane operations so they could be tagged and > checked. The attribute would be used by the passes to know that the special > conditions exist for those calls. > > As far as what it means to have a path, it could be complicated. > For example: > > x = ... > ballot_1(); > > could be transformed to: > > if (x < 4711) { > ballot_1(); > > if(x >= 4711) { > ballot_1(); > } > > So a simple path check would say there is a path, and the transform is legal, > but if we examine the conditions, there is no path, and the transform should not be legal. > It could be made even more obscure of course, but I don't see any optimizations really > doing this kind of thing, > > - Jan > > On Saturday, December 29, 2018, 11:32:25 AM EST, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 20.12.18 18:03, Connor Abbott wrote: > > We already have the notion of "convergent" functions like > > syncthreads(), to which we cannot add control-flow dependencies. > > That is, it's legal to hoist syncthreads out of an "if", but it's > > not legal to sink it into an "if". It's not clear to me why we > > can't have "anticonvergent" (terrible name) functions which cannot > > have control-flow dependencies removed from them? ballot() would be > > both convergent and anticonvergent. > > > > Would that solve your problem? > > > > > > I think it's important to note that we already have such an attribute, > > although with the opposite sense - it's impossible to remove control > > flow dependencies from a call unless you mark it as "speculatable". > > This isn't actually true. If both sides of an if/else have the same > non-speculative function call, it can still be moved out of control flow. > > That's because doing so doesn't change anything at all from a > single-threaded perspective. Hence why I think we should model the > communication between threads honestly. > > > > However, this doesn't prevent > > > > if (...) { > > } else { > > } > > foo = ballot(); > > > > from being turned into > > > > if (...) { > > foo1 = ballot(); > > } else { > > foo2 = ballot(); > > } > > foo = phi(foo1, foo2) > > > > and vice versa. We have a "noduplicate" attribute which prevents > > transforming the first into the second, but not the other way around. Of > > course we could keep going this way and add a "nocombine" attribute to > > complement noduplicate. But even then, there are even still problematic > > transforms. For example, take this program, which is simplified from a > > real game that doesn't work with the AMDGPU backend: > > > > while (cond1 /* uniform */) { > > ballot(); > > ... > > if (cond2 /* non-uniform */) continue; > > ... > > } > > > > In SPIR-V, when using structured control flow, the semantics of this are > > pretty clearly defined. In particular, there's a continue block after > > the body of the loop where control flow re-converges, and the only back > > edge is from the continue block, so the ballot is in uniform control > > flow. But LLVM will get rid of the continue block since it's empty, and > > re-analyze the loop as two nested loops, splitting the loop header in > > two, producing a CFG which corresponds to this: > > > > while (cond1 /* uniform */) { > > do { > > ballot(); > > ... > > } while (cond2 /* non-uniform */); > > ... > > } > > > > Now, in an implementation where control flow re-converges at the > > immediate post-dominator, this won't do the right thing anymore. In > > order to handle it correctly, you'd effectively need to always flatten > > nested loops, which will probably be really bad for performance if the > > programmer actually wanted the second thing. It also makes it impossible > > when translating a high-level language to LLVM to get the "natural" > > behavior which game developers actually expect. This is exactly the sort > > of "spooky action at a distance" which makes me think that everything > > we've done so far is really insufficient, and we need to add an explicit > > notion of control-flow divergence and reconvergence to the IR. We need a > > way to say that control flow re-converges at the continue block, so that > > LLVM won't eliminate it, and we can vectorize it correctly without > > penalizing cases where it's better for control flow not to re-converge. > > Well said! > > Cheers, > > Nicolai > -- > Lerne, wie die Welt wirklich ist, > Aber vergiss niemals, wie sie sein sollte. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190125/ff95716c/attachment-0001.html>
Jan Sjodin via llvm-dev
2019-Jan-25 14:13 UTC
[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
I just remembered that I had put some thought about moving and eliminating ballots before. I was looking intousing cyclic equivalence classes, and that a ballot could only be moved to a location with the same equivalenceclass, and two ballot() could be merged. I didn't think too much about it after that, because there wasno obvious way to handle uniform branches, which is important. I will see if I can figure something out thatmakes sense. - Jan On Thursday, January 24, 2019, 9:06:14 PM EST, Jan Sjodin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > for (...) {> ballot(); > if (... /* non-uniform */) continue; > } > > into > > for (...) { > do { > ballot(); > } while (... /* non-uniform */); > }I'm not sure if I follow this example, could you and explain a bit more?It looks to me that the condition in the "if" must be false (if thesame condition is used in the while), or we would call ballot the wrong number of times. About the CSE, when would that be legal? I can imagine with uniformbranches that it could work, but would like to see an example tofully understand this. I agree that it would be more conservative than if we model the threading, but I'm not sure about the cost/benefit. I am mostly curious if it is possible to have a single-thread view or not. Then we would have tosee if it is adequate. - Jan On Thursday, January 24, 2019, 10:31:47 AM EST, Connor Abbott <cwabbott0 at gmail.com> wrote: I don't see how this would fix the continue vs. nested loop problem I explained earlier. That is, how would this prevent turning: for (...) { ballot(); if (... /* non-uniform */) continue; } into for (...) { do { ballot(); } while (... /* non-uniform */); } and vice versa? Note that there's no duplication going on here, and the single-threaded flow of control is exactly the same. Another reason this isn't so great is that it prevents e.g. CSE on ballots that actually should be combined, since you're modelling it as a write. It seems like the optimizer is going to need some special knowledge of convergent things that fake memory constraints can't give us. On Thu, Jan 24, 2019 at 4:06 PM Jan Sjodin <jan_sjodin at yahoo.com> wrote:> > > I was looking into ballot() and how if it is possible to keep a single-threaded > view of the code, but add some extra conditions that must hold after the > transformation. I had the initial idea that each call to ballot() in a > single-threaded program can be seen as a partial write to a memory > location, and each location memory location is unique for every call site, > plus there some externally observable side effect. We can abstract this > away by tagging the calls, e.g. by using aliases. > > For example: > > if (...) { > foo1 = ballot(); > } else { > foo2 = ballot(); > } > > simply becomes: > > if (...) { > foo1 = ballot_1(); > } else { > foo2 = ballot_2(); > } > > > and > > if (...) { > } else { > } > ballot(); > > becomes > > if (...) { > } else { > } > ballot_1(); > > In the first case it would prevent combining the two calls into one > after the if. In the second example there is generally nothing that > says it could not be transformed into the first example with two > calls to ballot_1(), which should not be allowed. > > Another form of duplication that we must allow are loop transforms, > like unrolling or peeling. These might seem similar to the example > above, since we are cloning code and with conditions etc. But > they are different since they calls are in different loop iterations. > > The condition that needs to be met is that: > > There must be a single path between all cloned ballot_n() functions. > > The reason for this condition is that if we clone the same call, then > the copies must be mutually exclusive, but if they are cloned from > a loop, there must be a path, or we would skip iterations. > > If we want to be more sophisticated we can add: > > If there is no such path, the calls must be separated by uniform branches. > > After the transform everything should be re-tagged, since we already > checked the calls and we don't want to check them again. Also, not all > transforms need (or should) have the tagging checked. One example is > inlining, where multiple copies are created, but they are clearly different > calls. The tagging can be done temporarily for a single pass, and then > eliminated. This could be a good tool for debugging as well, since it can > detect if a transform is suspect. > > The code would of course have to make sense as far as control flow. If > we have: > > for(;;) { > if(...) { > ballot(); > break; > } > } > > This would have to be translated to: > > for(;;) { > if(...) { > ballot(); > if(UnknownConstant) { // Not a uniform condition, but later on translated to "true" > break; > } > } > > However, I think that this is the way the code is generated today anyway. > There would have to be some attribute that indicate that these calls (or functions) > contain ballot or other cross-lane operations so they could be tagged and > checked. The attribute would be used by the passes to know that the special > conditions exist for those calls. > > As far as what it means to have a path, it could be complicated. > For example: > > x = ... > ballot_1(); > > could be transformed to: > > if (x < 4711) { > ballot_1(); > > if(x >= 4711) { > ballot_1(); > } > > So a simple path check would say there is a path, and the transform is legal, > but if we examine the conditions, there is no path, and the transform should not be legal. > It could be made even more obscure of course, but I don't see any optimizations really > doing this kind of thing, > > - Jan > > On Saturday, December 29, 2018, 11:32:25 AM EST, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > On 20.12.18 18:03, Connor Abbott wrote: > > We already have the notion of "convergent" functions like > > syncthreads(), to which we cannot add control-flow dependencies. > > That is, it's legal to hoist syncthreads out of an "if", but it's > > not legal to sink it into an "if". It's not clear to me why we > > can't have "anticonvergent" (terrible name) functions which cannot > > have control-flow dependencies removed from them? ballot() would be > > both convergent and anticonvergent. > > > > Would that solve your problem? > > > > > > I think it's important to note that we already have such an attribute, > > although with the opposite sense - it's impossible to remove control > > flow dependencies from a call unless you mark it as "speculatable". > > This isn't actually true. If both sides of an if/else have the same > non-speculative function call, it can still be moved out of control flow. > > That's because doing so doesn't change anything at all from a > single-threaded perspective. Hence why I think we should model the > communication between threads honestly. > > > > However, this doesn't prevent > > > > if (...) { > > } else { > > } > > foo = ballot(); > > > > from being turned into > > > > if (...) { > > foo1 = ballot(); > > } else { > > foo2 = ballot(); > > } > > foo = phi(foo1, foo2) > > > > and vice versa. We have a "noduplicate" attribute which prevents > > transforming the first into the second, but not the other way around. Of > > course we could keep going this way and add a "nocombine" attribute to > > complement noduplicate. But even then, there are even still problematic > > transforms. For example, take this program, which is simplified from a > > real game that doesn't work with the AMDGPU backend: > > > > while (cond1 /* uniform */) { > > ballot(); > > ... > > if (cond2 /* non-uniform */) continue; > > ... > > } > > > > In SPIR-V, when using structured control flow, the semantics of this are > > pretty clearly defined. In particular, there's a continue block after > > the body of the loop where control flow re-converges, and the only back > > edge is from the continue block, so the ballot is in uniform control > > flow. But LLVM will get rid of the continue block since it's empty, and > > re-analyze the loop as two nested loops, splitting the loop header in > > two, producing a CFG which corresponds to this: > > > > while (cond1 /* uniform */) { > > do { > > ballot(); > > ... > > } while (cond2 /* non-uniform */); > > ... > > } > > > > Now, in an implementation where control flow re-converges at the > > immediate post-dominator, this won't do the right thing anymore. In > > order to handle it correctly, you'd effectively need to always flatten > > nested loops, which will probably be really bad for performance if the > > programmer actually wanted the second thing. It also makes it impossible > > when translating a high-level language to LLVM to get the "natural" > > behavior which game developers actually expect. This is exactly the sort > > of "spooky action at a distance" which makes me think that everything > > we've done so far is really insufficient, and we need to add an explicit > > notion of control-flow divergence and reconvergence to the IR. We need a > > way to say that control flow re-converges at the continue block, so that > > LLVM won't eliminate it, and we can vectorize it correctly without > > penalizing cases where it's better for control flow not to re-converge. > > Well said! > > Cheers, > > Nicolai > -- > Lerne, wie die Welt wirklich ist, > Aber vergiss niemals, wie sie sein sollte. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190125/60fe904e/attachment.html>
Connor Abbott via llvm-dev
2019-Jan-28 16:16 UTC
[llvm-dev] [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
On Fri, Jan 25, 2019 at 3:05 AM Jan Sjodin <jan_sjodin at yahoo.com> wrote:> > > for (...) { > > ballot(); > > if (... /* non-uniform */) continue; > > } > > > > into > > > > for (...) { > > do { > > ballot(); > > } while (... /* non-uniform */); > > } > > I'm not sure if I follow this example, could you and explain a bit more? > It looks to me that the condition in the "if" must be false (if the > same condition is used in the while), or we would > call ballot the wrong number of times.Yes, the idea is that the same condition is used in the if and the do-while. I think I messed up the example a little... in the second snippet, we're supposed to break out of the inner loop if the outer loop's exit condition is true. Here's a more concrete example: for (int i = 0; i < 2; i++) { foo = ballot(true); // ballot 1 if (threadID /* ID of the thread within a wavefront/warp */ % 2 == 0) continue; bar = ballot(true); // ballot 2 } versus: int i = 0; while (true) { do { if (i == 2) break; foo = ballot(true); // ballot 1 i++; } while (threadID % 2 == 0); if (i == 2) break; bar = ballot(true); // ballot 2 i++; }>From a single-threaded perspective, these two snippets are identical, evenif ballot() writes to arbitrary memory. The first can easily get transformed to something like the second when LLVM decides to duplicate the final i++ through jump forwarding, and then re-interprets the loop as two nested loops and splits the loop header in two. This is what currently happens with DOOM when we try to enable subgroup operations with it. Let's say there are two threads in a wavefront. Then the execution trace mandated by SPIR-V for the first looks like: thread 0 | thread 1 ballot 1 = 0b11 | ballot 1 = 0b11 skipped | ballot 2 = 0b10 ballot 1 = 0b11 | ballot 1 = 0b11 skipped | ballot 2 = 0b10 Now, contrast this with the execution trace that programmers would expect for the second example: thread 0 | thread 1 ballot 1 = 0b11 | ballot 1 = 0b11 ballot 1 = 0b01 | skipped skipped | ballot 2 = 0b10 skipped | ballot 1 = 0b10 skipped | ballot 2 = 0b10 Nicolai's proposal solves this by having the frontend emit a merge intrinsic before the i++ is emitted. This prevents the jump forwarding from occurring.> > About the CSE, when would that be legal? I can imagine with uniform > branches that it could work, but would like to see an example to > fully understand this. > > I agree that it would be more conservative than if we model the threading, > but I'm not sure about the cost/benefit. I am mostly curious if it is > possible to have a single-thread view or not. Then we would have to > see if it is adequate. > > - Jan > > On Thursday, January 24, 2019, 10:31:47 AM EST, Connor Abbott <cwabbott0 at gmail.com> wrote:> > > I don't see how this would fix the continue vs. nested loop problem I > explained earlier. That is, how would this prevent turning: > > for (...) { > ballot(); > if (... /* non-uniform */) continue; > } > > into > > for (...) { > do { > ballot(); > } while (... /* non-uniform */); > } > > and vice versa? Note that there's no duplication going on here, and > the single-threaded flow of control is exactly the same. > > Another reason this isn't so great is that it prevents e.g. CSE on > ballots that actually should be combined, since you're modelling it as > a write. It seems like the optimizer is going to need some special > knowledge of convergent things that fake memory constraints can't give > us. > > On Thu, Jan 24, 2019 at 4:06 PM Jan Sjodin <jan_sjodin at yahoo.com> wrote: > > > > > > I was looking into ballot() and how if it is possible to keep asingle-threaded> > view of the code, but add some extra conditions that must hold after the > > transformation. I had the initial idea that each call to ballot() in a > > single-threaded program can be seen as a partial write to a memory > > location, and each location memory location is unique for every callsite,> > plus there some externally observable side effect. We can abstract this > > away by tagging the calls, e.g. by using aliases. > > > > For example: > > > > if (...) { > > foo1 = ballot(); > > } else { > > foo2 = ballot(); > > } > > > > simply becomes: > > > > if (...) { > > foo1 = ballot_1(); > > } else { > > foo2 = ballot_2(); > > } > > > > > > and > > > > if (...) { > > } else { > > } > > ballot(); > > > > becomes > > > > if (...) { > > } else { > > } > > ballot_1(); > > > > In the first case it would prevent combining the two calls into one > > after the if. In the second example there is generally nothing that > > says it could not be transformed into the first example with two > > calls to ballot_1(), which should not be allowed. > > > > Another form of duplication that we must allow are loop transforms, > > like unrolling or peeling. These might seem similar to the example > > above, since we are cloning code and with conditions etc. But > > they are different since they calls are in different loop iterations. > > > > The condition that needs to be met is that: > > > > There must be a single path between all cloned ballot_n() functions. > > > > The reason for this condition is that if we clone the same call, then > > the copies must be mutually exclusive, but if they are cloned from > > a loop, there must be a path, or we would skip iterations. > > > > If we want to be more sophisticated we can add: > > > > If there is no such path, the calls must be separated by uniformbranches.> > > > After the transform everything should be re-tagged, since we already > > checked the calls and we don't want to check them again. Also, not all > > transforms need (or should) have the tagging checked. One example is > > inlining, where multiple copies are created, but they are clearlydifferent> > calls. The tagging can be done temporarily for a single pass, and then > > eliminated. This could be a good tool for debugging as well, since itcan> > detect if a transform is suspect. > > > > The code would of course have to make sense as far as control flow. If > > we have: > > > > for(;;) { > > if(...) { > > ballot(); > > break; > > } > > } > > > > This would have to be translated to: > > > > for(;;) { > > if(...) { > > ballot(); > > if(UnknownConstant) { // Not a uniform condition, but later ontranslated to "true"> > break; > > } > > } > > > > However, I think that this is the way the code is generated todayanyway.> > There would have to be some attribute that indicate that these calls(or functions)> > contain ballot or other cross-lane operations so they could be taggedand> > checked. The attribute would be used by the passes to know that thespecial> > conditions exist for those calls. > > > > As far as what it means to have a path, it could be complicated. > > For example: > > > > x = ... > > ballot_1(); > > > > could be transformed to: > > > > if (x < 4711) { > > ballot_1(); > > > > if(x >= 4711) { > > ballot_1(); > > } > > > > So a simple path check would say there is a path, and the transform islegal,> > but if we examine the conditions, there is no path, and the transformshould not be legal.> > It could be made even more obscure of course, but I don't see anyoptimizations really> > doing this kind of thing, > > > > - Jan > > > > On Saturday, December 29, 2018, 11:32:25 AM EST, Nicolai Hähnle viallvm-dev <llvm-dev at lists.llvm.org> wrote:> > > > > > On 20.12.18 18:03, Connor Abbott wrote: > > > We already have the notion of "convergent" functions like > > > syncthreads(), to which we cannot add control-flow dependencies. > > > That is, it's legal to hoist syncthreads out of an "if", but it's > > > not legal to sink it into an "if". It's not clear to me why we > > > can't have "anticonvergent" (terrible name) functions which cannot > > > have control-flow dependencies removed from them? ballot() wouldbe> > > both convergent and anticonvergent. > > > > > > Would that solve your problem? > > > > > > > > > I think it's important to note that we already have such an attribute, > > > although with the opposite sense - it's impossible to remove control > > > flow dependencies from a call unless you mark it as "speculatable". > > > > This isn't actually true. If both sides of an if/else have the same > > non-speculative function call, it can still be moved out of controlflow.> > > > That's because doing so doesn't change anything at all from a > > single-threaded perspective. Hence why I think we should model the > > communication between threads honestly. > > > > > > > However, this doesn't prevent > > > > > > if (...) { > > > } else { > > > } > > > foo = ballot(); > > > > > > from being turned into > > > > > > if (...) { > > > foo1 = ballot(); > > > } else { > > > foo2 = ballot(); > > > } > > > foo = phi(foo1, foo2) > > > > > > and vice versa. We have a "noduplicate" attribute which prevents > > > transforming the first into the second, but not the other way around.Of> > > course we could keep going this way and add a "nocombine" attribute to > > > complement noduplicate. But even then, there are even stillproblematic> > > transforms. For example, take this program, which is simplified from a > > > real game that doesn't work with the AMDGPU backend: > > > > > > while (cond1 /* uniform */) { > > > ballot(); > > > ... > > > if (cond2 /* non-uniform */) continue; > > > ... > > > } > > > > > > In SPIR-V, when using structured control flow, the semantics of thisare> > > pretty clearly defined. In particular, there's a continue block after > > > the body of the loop where control flow re-converges, and the onlyback> > > edge is from the continue block, so the ballot is in uniform control > > > flow. But LLVM will get rid of the continue block since it's empty,and> > > re-analyze the loop as two nested loops, splitting the loop header in > > > two, producing a CFG which corresponds to this: > > > > > > while (cond1 /* uniform */) { > > > do { > > > ballot(); > > > ... > > > } while (cond2 /* non-uniform */); > > > ... > > > } > > > > > > Now, in an implementation where control flow re-converges at the > > > immediate post-dominator, this won't do the right thing anymore. In > > > order to handle it correctly, you'd effectively need to always flatten > > > nested loops, which will probably be really bad for performance if the > > > programmer actually wanted the second thing. It also makes itimpossible> > > when translating a high-level language to LLVM to get the "natural" > > > behavior which game developers actually expect. This is exactly thesort> > > of "spooky action at a distance" which makes me think that everything > > > we've done so far is really insufficient, and we need to add anexplicit> > > notion of control-flow divergence and reconvergence to the IR. Weneed a> > > way to say that control flow re-converges at the continue block, sothat> > > LLVM won't eliminate it, and we can vectorize it correctly without > > > penalizing cases where it's better for control flow not tore-converge.> > > > Well said! > > > > Cheers, > > > > Nicolai > > -- > > Lerne, wie die Welt wirklich ist, > > Aber vergiss niemals, wie sie sein sollte. > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190128/02e235d8/attachment.html>
Seemingly Similar Threads
- [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
- [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
- [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
- [RFC] Adding thread group semantics to LangRef (motivated by GPUs)
- [RFC] Adding thread group semantics to LangRef (motivated by GPUs)