James Molloy via llvm-dev
2015-Aug-14 07:56 UTC
[llvm-dev] [LLVMdev] RFC: Convergent attribute
Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi Jingyue, > > Convergent is not intended to prevent inlining. It’s tricky to formalize > this inter-procedurally, but the intended interpretation is that a > convergent operation cannot be move either into or out of a conditionally > executed region. Normal inlining would not violate that. > > I would imagine that it would make sense to use a combination of > convergent and noduplicate for barrier-like operations. > > > Isn’t convergent implying “noduplicate” inside a function? > It’s late but I’m not sure I can figure when a transformation would be > allowed to duplicate a call to a convergent intrinsic? > > — > Mehdi > > > > —Owen > > On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com> wrote: > > Hi Owen, > > According to your design, is LLVM supposed to (partially) disallow > inlining a function that has convergent instructions? It's hard to define > control equivalent inter-procedurally. For example, if a function > containing a convergent instruction is called at two call sites, inlining > the function produces two convergent instructions. Neither of the two is > control equivalent to the original, but they combined are in some sense. > > I came across this when I am thinking whether __syncthreads in CUDA should > be tagged "convergent'. Right now, it's tagged as noduplicate so inlining > and loop unrolling are disallowed. But I think noduplicate is too strong > for the semantics of convergent. > > Jingyue > > On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com> wrote: > >> Below is a proposal for a new "convergent" intrinsic attribute and >> MachineInstr property, needed for correctly modeling many SPMD/SIMT >> programming models in LLVM. Comments and feedback welcome. >> >> —Owen >> >> >> >> >> >> In order to make LLVM more suitable for programming models variously >> called SPMD >> and SIMT, we would like to propose a new intrinsic and MachineInstr >> annotation >> called "convergent", which will be used to impose certain control flow >> and/or >> code motion constraints that are necessary for the correct compilation of >> some >> common constructs in these programming models. >> >> Our proposal strives to define the semantics of these annotations >> *without* >> introducing a definition of SPMD/SIMT programming models into LLVM IR. >> Rather, >> the properties that must be preserved are specified purely in terms of >> single >> thread semantics. This allows pass authors to reason about the >> constraints >> without having to consider alternative programming models. The downside >> to >> this approach is that the motivation and necessity of these constraints >> in not >> easily understood without understanding the programming model from which >> they >> derive. >> >> *** WHAT *** >> >> (Thanks to Phil Reames for input on this definition.) >> >> An operation marked convergent may be transformed or moved within the >> program >> if and only the post-transform placement of the convergent operation is >> control equivalent (A dominated B, B post-dominates A, or vice-versa) to >> its original position. >> >> This definition is overly strict with respect to some SPMD/SIMT models, >> but cannot be relaxed without introducing a specific model into LLVM IR. >> We >> believe it is important for LLVM itself to remain agnostic to any specific >> model. This allows core passes to preserve correctness for stricter >> models, >> while more relaxed models can implement additional transforms that use >> weaker constraints on top of core LLVM. >> >> *** HOW *** >> >> Once the attribute has been added, we anticipate the following changes to >> optimization passes will be required: >> - Restrict Sink and MachineSink for convergent operations >> - Disabling PRE for convergent operations >> - Disabling jump threading of convergent operations >> - Auditing SimplifyCFG for additional transforms that break convergent >> guarantees >> >> *** WHY *** >> >> SPMD/SIMT programming models are a family of related programming models in >> which multiple threads execute in a per-instruction lockstep fashion. >> Predication is typically used to implement acyclic control flow that would >> otherwise diverge the PC address of the lockstep threads. >> >> In these models, each thread's register set is typically indepedent, but >> there >> exist a small number of important circumstances in which a thread may >> access >> register storage from one of its lockstep neighbors. Examples include >> gradient >> computation for texture lookups, as well a cross-thread broadcast and >> shuffle >> operations. >> >> These operations that provide access to another thread's register storage >> pose >> a particular challenge to the compiler, particularly when combined with >> the >> use of predication for control flow. Consider the following example: >> >> // texture lookup that computes gradient of r0, last use of r0 >> r1 = texture2D(..., r0, ...) >> if (...) { >> // r0 used as temporary here >> r0 = ... >> r2 = r0 + ... >> } else { >> // only use of r1 >> r2 = r1 + ... >> } >> >> In this example, various optimizations might try to sink the texture2D >> operation >> into the else block, like so: >> >> if (...) { >> r0 = ... >> r2 = r0 + ... >> } else { >> r1 = texture2D(..., r0, ...) >> r2 = r1 + ... >> } >> >> At this point, it starts to become clear that a problem can occur when two >> neighbor threads want to take different paths through the if-else >> construct. >> Logically, the thread that wishes to execute the texture2D races with its >> neighbor to reads the neighbor's value of r0 before it gets overridden. >> >> In most SPMD/SIMT implementations, the fallout of this races is exposed >> via >> the predicated expression of acyclic control flow: >> >> pred0 <- cmp ... >> if (pred0) r0 = ... >> if (pred0) r2 = r0 + ... >> if (!pred0) r1 = texture2D(..., r0, ...) >> if (!pred0) r2 = r1 + ... >> >> If thread 0 takes the else path and perform the texture2D operation, but >> its neighbor thread 1 takes the then branch, then the texture2D will fail >> because thread 1 has already overwritten its value of r0 before thread 0 >> has >> a chance to read it. >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150814/8d743f2a/attachment.html>
Mehdi Amini via llvm-dev
2015-Aug-14 23:28 UTC
[llvm-dev] [LLVMdev] RFC: Convergent attribute
Hi James, That sounds reasonable to me. Any idea of a transformation that would want to do that though? — Mehdi> On Aug 14, 2015, at 12:56 AM, James Molloy <james at jamesmolloy.co.uk> wrote: > > Hi Mehdi, > > My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. > > James > > On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi Jingyue, >> >> Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. >> >> I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like operations. > > Isn’t convergent implying “noduplicate” inside a function? > It’s late but I’m not sure I can figure when a transformation would be allowed to duplicate a call to a convergent intrinsic? > > — > Mehdi > > >> >> —Owen >> >>> On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com <mailto:jingyue at google.com>> wrote: >>> >>> Hi Owen, >>> >>> According to your design, is LLVM supposed to (partially) disallow inlining a function that has convergent instructions? It's hard to define control equivalent inter-procedurally. For example, if a function containing a convergent instruction is called at two call sites, inlining the function produces two convergent instructions. Neither of the two is control equivalent to the original, but they combined are in some sense. >>> >>> I came across this when I am thinking whether __syncthreads in CUDA should be tagged "convergent'. Right now, it's tagged as noduplicate so inlining and loop unrolling are disallowed. But I think noduplicate is too strong for the semantics of convergent. >>> >>> Jingyue >>> >>> On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote: >>> Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. >>> >>> —Owen >>> >>> >>> >>> >>> >>> In order to make LLVM more suitable for programming models variously called SPMD >>> and SIMT, we would like to propose a new intrinsic and MachineInstr annotation >>> called "convergent", which will be used to impose certain control flow and/or >>> code motion constraints that are necessary for the correct compilation of some >>> common constructs in these programming models. >>> >>> Our proposal strives to define the semantics of these annotations *without* >>> introducing a definition of SPMD/SIMT programming models into LLVM IR. Rather, >>> the properties that must be preserved are specified purely in terms of single >>> thread semantics. This allows pass authors to reason about the constraints >>> without having to consider alternative programming models. The downside to >>> this approach is that the motivation and necessity of these constraints in not >>> easily understood without understanding the programming model from which they >>> derive. >>> >>> *** WHAT *** >>> >>> (Thanks to Phil Reames for input on this definition.) >>> >>> An operation marked convergent may be transformed or moved within the program >>> if and only the post-transform placement of the convergent operation is >>> control equivalent (A dominated B, B post-dominates A, or vice-versa) to >>> its original position. >>> >>> This definition is overly strict with respect to some SPMD/SIMT models, >>> but cannot be relaxed without introducing a specific model into LLVM IR. We >>> believe it is important for LLVM itself to remain agnostic to any specific >>> model. This allows core passes to preserve correctness for stricter models, >>> while more relaxed models can implement additional transforms that use >>> weaker constraints on top of core LLVM. >>> >>> *** HOW *** >>> >>> Once the attribute has been added, we anticipate the following changes to >>> optimization passes will be required: >>> - Restrict Sink and MachineSink for convergent operations >>> - Disabling PRE for convergent operations >>> - Disabling jump threading of convergent operations >>> - Auditing SimplifyCFG for additional transforms that break convergent guarantees >>> >>> *** WHY *** >>> >>> SPMD/SIMT programming models are a family of related programming models in >>> which multiple threads execute in a per-instruction lockstep fashion. >>> Predication is typically used to implement acyclic control flow that would >>> otherwise diverge the PC address of the lockstep threads. >>> >>> In these models, each thread's register set is typically indepedent, but there >>> exist a small number of important circumstances in which a thread may access >>> register storage from one of its lockstep neighbors. Examples include gradient >>> computation for texture lookups, as well a cross-thread broadcast and shuffle >>> operations. >>> >>> These operations that provide access to another thread's register storage pose >>> a particular challenge to the compiler, particularly when combined with the >>> use of predication for control flow. Consider the following example: >>> >>> // texture lookup that computes gradient of r0, last use of r0 >>> r1 = texture2D(..., r0, ...) >>> if (...) { >>> // r0 used as temporary here >>> r0 = ... >>> r2 = r0 + ... >>> } else { >>> // only use of r1 >>> r2 = r1 + ... >>> } >>> >>> In this example, various optimizations might try to sink the texture2D operation >>> into the else block, like so: >>> >>> if (...) { >>> r0 = ... >>> r2 = r0 + ... >>> } else { >>> r1 = texture2D(..., r0, ...) >>> r2 = r1 + ... >>> } >>> >>> At this point, it starts to become clear that a problem can occur when two >>> neighbor threads want to take different paths through the if-else construct. >>> Logically, the thread that wishes to execute the texture2D races with its >>> neighbor to reads the neighbor's value of r0 before it gets overridden. >>> >>> In most SPMD/SIMT implementations, the fallout of this races is exposed via >>> the predicated expression of acyclic control flow: >>> >>> pred0 <- cmp ... >>> if (pred0) r0 = ... >>> if (pred0) r2 = r0 + ... >>> if (!pred0) r1 = texture2D(..., r0, ...) >>> if (!pred0) r2 = r1 + ... >>> >>> If thread 0 takes the else path and perform the texture2D operation, but >>> its neighbor thread 1 takes the then branch, then the texture2D will fail >>> because thread 1 has already overwritten its value of r0 before thread 0 has >>> a chance to read it. >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150814/025f267d/attachment-0001.html>
Jingyue Wu via llvm-dev
2015-Aug-15 00:22 UTC
[llvm-dev] [LLVMdev] RFC: Convergent attribute
Loop unrolling. On Fri, Aug 14, 2015 at 4:28 PM, Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi James, > > That sounds reasonable to me. Any idea of a transformation that would want > to do that though? > > — > Mehdi > > On Aug 14, 2015, at 12:56 AM, James Molloy <james at jamesmolloy.co.uk> > wrote: > > Hi Mehdi, > > My reading of it is that if you have a convergent instruction A, it is > legal to duplicate it to instruction B if (assuming B is after A in program > flow) A dominates B and B post-dominates A. > > James > > On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> Hi Jingyue, >> >> Convergent is not intended to prevent inlining. It’s tricky to formalize >> this inter-procedurally, but the intended interpretation is that a >> convergent operation cannot be move either into or out of a conditionally >> executed region. Normal inlining would not violate that. >> >> I would imagine that it would make sense to use a combination of >> convergent and noduplicate for barrier-like operations. >> >> >> Isn’t convergent implying “noduplicate” inside a function? >> It’s late but I’m not sure I can figure when a transformation would be >> allowed to duplicate a call to a convergent intrinsic? >> >> — >> Mehdi >> >> >> >> —Owen >> >> On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com> wrote: >> >> Hi Owen, >> >> According to your design, is LLVM supposed to (partially) disallow >> inlining a function that has convergent instructions? It's hard to define >> control equivalent inter-procedurally. For example, if a function >> containing a convergent instruction is called at two call sites, inlining >> the function produces two convergent instructions. Neither of the two is >> control equivalent to the original, but they combined are in some sense. >> >> I came across this when I am thinking whether __syncthreads in CUDA >> should be tagged "convergent'. Right now, it's tagged as noduplicate so >> inlining and loop unrolling are disallowed. But I think noduplicate is too >> strong for the semantics of convergent. >> >> Jingyue >> >> On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com> wrote: >> >>> Below is a proposal for a new "convergent" intrinsic attribute and >>> MachineInstr property, needed for correctly modeling many SPMD/SIMT >>> programming models in LLVM. Comments and feedback welcome. >>> >>> —Owen >>> >>> >>> >>> >>> >>> In order to make LLVM more suitable for programming models variously >>> called SPMD >>> and SIMT, we would like to propose a new intrinsic and MachineInstr >>> annotation >>> called "convergent", which will be used to impose certain control flow >>> and/or >>> code motion constraints that are necessary for the correct compilation >>> of some >>> common constructs in these programming models. >>> >>> Our proposal strives to define the semantics of these annotations >>> *without* >>> introducing a definition of SPMD/SIMT programming models into LLVM IR. >>> Rather, >>> the properties that must be preserved are specified purely in terms of >>> single >>> thread semantics. This allows pass authors to reason about the >>> constraints >>> without having to consider alternative programming models. The downside >>> to >>> this approach is that the motivation and necessity of these constraints >>> in not >>> easily understood without understanding the programming model from which >>> they >>> derive. >>> >>> *** WHAT *** >>> >>> (Thanks to Phil Reames for input on this definition.) >>> >>> An operation marked convergent may be transformed or moved within the >>> program >>> if and only the post-transform placement of the convergent operation is >>> control equivalent (A dominated B, B post-dominates A, or vice-versa) to >>> its original position. >>> >>> This definition is overly strict with respect to some SPMD/SIMT models, >>> but cannot be relaxed without introducing a specific model into LLVM IR. >>> We >>> believe it is important for LLVM itself to remain agnostic to any >>> specific >>> model. This allows core passes to preserve correctness for stricter >>> models, >>> while more relaxed models can implement additional transforms that use >>> weaker constraints on top of core LLVM. >>> >>> *** HOW *** >>> >>> Once the attribute has been added, we anticipate the following changes to >>> optimization passes will be required: >>> - Restrict Sink and MachineSink for convergent operations >>> - Disabling PRE for convergent operations >>> - Disabling jump threading of convergent operations >>> - Auditing SimplifyCFG for additional transforms that break convergent >>> guarantees >>> >>> *** WHY *** >>> >>> SPMD/SIMT programming models are a family of related programming models >>> in >>> which multiple threads execute in a per-instruction lockstep fashion. >>> Predication is typically used to implement acyclic control flow that >>> would >>> otherwise diverge the PC address of the lockstep threads. >>> >>> In these models, each thread's register set is typically indepedent, but >>> there >>> exist a small number of important circumstances in which a thread may >>> access >>> register storage from one of its lockstep neighbors. Examples include >>> gradient >>> computation for texture lookups, as well a cross-thread broadcast and >>> shuffle >>> operations. >>> >>> These operations that provide access to another thread's register >>> storage pose >>> a particular challenge to the compiler, particularly when combined with >>> the >>> use of predication for control flow. Consider the following example: >>> >>> // texture lookup that computes gradient of r0, last use of r0 >>> r1 = texture2D(..., r0, ...) >>> if (...) { >>> // r0 used as temporary here >>> r0 = ... >>> r2 = r0 + ... >>> } else { >>> // only use of r1 >>> r2 = r1 + ... >>> } >>> >>> In this example, various optimizations might try to sink the texture2D >>> operation >>> into the else block, like so: >>> >>> if (...) { >>> r0 = ... >>> r2 = r0 + ... >>> } else { >>> r1 = texture2D(..., r0, ...) >>> r2 = r1 + ... >>> } >>> >>> At this point, it starts to become clear that a problem can occur when >>> two >>> neighbor threads want to take different paths through the if-else >>> construct. >>> Logically, the thread that wishes to execute the texture2D races with its >>> neighbor to reads the neighbor's value of r0 before it gets overridden. >>> >>> In most SPMD/SIMT implementations, the fallout of this races is exposed >>> via >>> the predicated expression of acyclic control flow: >>> >>> pred0 <- cmp ... >>> if (pred0) r0 = ... >>> if (pred0) r2 = r0 + ... >>> if (!pred0) r1 = texture2D(..., r0, ...) >>> if (!pred0) r2 = r1 + ... >>> >>> If thread 0 takes the else path and perform the texture2D operation, but >>> its neighbor thread 1 takes the then branch, then the texture2D will fail >>> because thread 1 has already overwritten its value of r0 before thread 0 >>> has >>> a chance to read it. >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150814/398d158a/attachment.html>