Owen Anderson via llvm-dev
2015-Sep-22 17:39 UTC
[llvm-dev] [RFC] Refinement of convergent semantics
Hi Jingyue, I consider it a very important element of the design of convergent that it does not require baseline LLVM to contain a definition of uniformity, which would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The intention is that it should be a conservative (but hopefully not too conservative) approximation, and that implementations of specific GPU programming models (CUDA, OpenCL, individual GPU vendors, etc) may layer more permissive semantics on top of it in code that is specific to that programming model. —Owen> On Sep 22, 2015, at 10:33 AM, Jingyue Wu <jingyue at google.com> wrote: > > Hi Owen, > > This is very interesting. > > How different is "convergent" from "uniform"? An instruction is uniform if threads in the same SIMT unit (e.g. warp) do not diverge when executing this instruction. > > I ask this because Bjarke recently came up with a mathematical definition of uniformity. I wonder if that is a foundation "convergent" needs as well. AFAICT, Bjarke's definition of "uniformity" is less restrictive than "convergent". For example, it allows loop unswitching the following code if "c" is uniform, which seems a case you ideally want to allow. > > DISALLOWED: > for (…) { > if (c) { … } > convergent(); > } > > Jingyue > > On Fri, Sep 4, 2015 at 1:25 PM, Owen Anderson via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > Hi all, > > In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that should resolve a lot of the identified problems regarding loop unrolling, loop unswitching, etc. Credit to John McCall for talking this over with me and seeding the core ideas. > > Today, convergent operations may only be moved into control-equivalent locations, or, in layman’s terms, a convergent operation may neither be sunk into nor hoisted out of, a condition. This causes problems for full loop unrolling, as the control dependence on the loop counter is eliminated, but our intuition indicates that this dependence was somehow trivial. More concretely, all know uses of convergent are OK with full unrolling, making this semantic undesirable. Related problems arise in loop unswitching as well. > > The proposed change is to split the semantics of convergent into two annotations: > convergent - this operation may not be made control dependent on any additional values (aka may not be sunk into a condition) > nospeculate - this operation may not be added to any program trace on which it was not previously executed (same as notrap?) > > Most of today’s convergent operations (barriers, arithmetic gradients) would continue to be marked only as convergent. The new semantics would allow full loop unrolling, and provide clarity on which loop unswitching operations are allowed, examples below. > > The one case where nospeculate would also be needed is in the case of texture fetches that compute implicit gradients. Because the computed gradient forms part of the addressing mode, gibberish gradients here can cause invalid memory dereferences. > > —Owen > > —————————————————— > > Loop Unswitching Examples > > ALLOWED: > for (…) { > if (c) { convergent(); } > } > > DISALLOWED: > for (…) { > if (c) { … } > convergent(); > } > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150922/023be6e3/attachment.html>
Jingyue Wu via llvm-dev
2015-Sep-22 21:02 UTC
[llvm-dev] [RFC] Refinement of convergent semantics
The intention sounds reasonable. Given they are common and motivate the convergent attribute, I don't object to introducing some implementation-independent SIMT concepts. But I understand that could be more controversial. My general concern is still that some terms in this definition are not well-defined for arbitrary transformations especially duplication. For example, regarding> convergent - this operation may not be made control dependent on anyadditional values (aka may not be sunk into a condition) Is LLVM allowed to unroll for (int i = 0; i < 4; ++i) { if (i < c) // c is loop invariant convergent(); } to if (0 < c) convergent(); if (1 < c) convergent(); if (2 < c) convergent(); if (3 < c) convergent(); ? Is "0 < c" considered "an additional value"? I'd vote no, but one can argue the other way. One approach (Bjarke's idea) to work around such ambiguities is to define: a program is convergent-correct if everything marked convergent are indeed convergent. Then a transformation is convergent-correct unless it transforms a convergent-correct program to a convergent-incorrect one. However, defining "convergent-correct" involves SIMT concepts which you want to avoid here. On Tue, Sep 22, 2015 at 10:39 AM, Owen Anderson <resistor at mac.com> wrote:> Hi Jingyue, > > I consider it a very important element of the design of convergent that it > does not require baseline LLVM to contain a definition of uniformity, which > would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The > intention is that it should be a conservative (but hopefully not too > conservative) approximation, and that implementations of specific GPU > programming models (CUDA, OpenCL, individual GPU vendors, etc) may layer > more permissive semantics on top of it in code that is specific to that > programming model. > > —Owen > > On Sep 22, 2015, at 10:33 AM, Jingyue Wu <jingyue at google.com> wrote: > > Hi Owen, > > This is very interesting. > > How different is "convergent" from "uniform"? An instruction is uniform if > threads in the same SIMT unit (e.g. warp) do not diverge when executing > this instruction. > > I ask this because Bjarke recently came up with a mathematical definition > of uniformity. I wonder if that is a foundation "convergent" needs as well. > AFAICT, Bjarke's definition of "uniformity" is less restrictive than > "convergent". For example, it allows loop unswitching the following code > if "c" is uniform, which seems a case you ideally want to allow. > > DISALLOWED: > for (…) { > if (c) { … } > convergent(); > } > > Jingyue > > On Fri, Sep 4, 2015 at 1:25 PM, Owen Anderson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> In light of recent discussions regarding updating passes to respect >> convergent semantics, and whether or not it is sufficient for barriers, I >> would like to propose a change in convergent semantics that should resolve >> a lot of the identified problems regarding loop unrolling, loop >> unswitching, etc. Credit to John McCall for talking this over with me and >> seeding the core ideas. >> >> Today, convergent operations may only be moved into control-equivalent >> locations, or, in layman’s terms, a convergent operation may neither be >> sunk into nor hoisted out of, a condition. This causes problems for full >> loop unrolling, as the control dependence on the loop counter is >> eliminated, but our intuition indicates that this dependence was somehow >> trivial. More concretely, all know uses of convergent are OK with full >> unrolling, making this semantic undesirable. Related problems arise in >> loop unswitching as well. >> >> The proposed change is to split the semantics of convergent into two >> annotations: >> convergent - this operation may not be made control dependent on >> any additional values (aka may not be sunk into a condition) >> nospeculate - this operation may not be added to any program >> trace on which it was not previously executed (same as notrap?) >> >> Most of today’s convergent operations (barriers, arithmetic gradients) >> would continue to be marked only as convergent. The new semantics would >> allow full loop unrolling, and provide clarity on which loop unswitching >> operations are allowed, examples below. >> >> The one case where nospeculate would also be needed is in the case of >> texture fetches that compute implicit gradients. Because the computed >> gradient forms part of the addressing mode, gibberish gradients here can >> cause invalid memory dereferences. >> >> —Owen >> >> —————————————————— >> >> Loop Unswitching Examples >> >> ALLOWED: >> for (…) { >> if (c) { convergent(); } >> } >> >> DISALLOWED: >> for (…) { >> if (c) { … } >> convergent(); >> } >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150922/b75b50b3/attachment.html>
Owen Anderson via llvm-dev
2015-Sep-22 21:36 UTC
[llvm-dev] [RFC] Refinement of convergent semantics
> On Sep 22, 2015, at 2:02 PM, Jingyue Wu <jingyue at google.com> wrote: > > Is LLVM allowed to unroll > > for (int i = 0; i < 4; ++i) { > if (i < c) // c is loop invariant > convergent(); > } > > to > > if (0 < c) > convergent(); > if (1 < c) > convergent(); > if (2 < c) > convergent(); > if (3 < c) > convergent(); > > ? > > Is "0 < c" considered "an additional value"? I'd vote no, but one can argue the other way.My intuition agrees with you here, but I don’t know how to formalize it.> One approach (Bjarke's idea) to work around such ambiguities is to define: a program is convergent-correct if everything marked convergent are indeed convergent. Then a transformation is convergent-correct unless it transforms a convergent-correct program to a convergent-incorrect one. However, defining "convergent-correct" involves SIMT concepts which you want to avoid here.There are situations where this is over-conservative as well. Some convergent operations do not require uniformity per se. For example, a gradient operation in graphics programming models requires that all four threads in a given quad are either all executing or all not executing, though a warp is generally larger than that. Restricting convergent code motion to only be between uniform control flow points in the program would penalize that use case. —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150922/8dac40af/attachment.html>