Sameer Sahasrabuddhe via llvm-dev
2021-Jun-01 11:58 UTC
[llvm-dev] RFC: make calls "convergent" by default
TL;DR ==== We propose the following changes to LLVM IR in order to better support operations that are sensitive to the set of threads that execute them together: - Redefine "convergent" in terms of thread divergence in a multi-threaded execution. - Fix all optimizations that examine the "convergent" attribute to also depend on divergence analysis. This avoids any impact on CPU compilation since control flow is always uniform on CPUs. - Make all function calls "convergent" by default (D69498). Introduce a new "noconvergent" attribute, and make "convergent" a nop. - Update the "convergence tokens" proposal to take into account this new default property (D85603). Motivation ========= This effort is necessary because the current "convergent" attribute is considered under-defined and sorely needs replacement. 1. On GPU targets, the "convergent" attribute is required for correctness. This is unlike other attributes that are only used as optimization hints. Missing an attribute should not result in a miscompilation. 2. The current definition of "convergent" attribute does not precisely represent the constraints on the compiler for a GPU target. The actual implementation in LLVM sources is far more conservative than what the definition says. 3. Due to the same lack of precision, the attribute cannot properly represent the side-effects of jump threading on a GPU program. Background ========= This RFC is a continuation of a discussion split across the following two reviews. The two reviews compose well to cover all the shortcomings of the convergent attribute. D69498: IR: Invert convergent attribute handling https://reviews.llvm.org/D69498 The above review aims to make all function calls "convergent" by default, but it received strong opposition due to the requirement that CPU frontends must now emit a new "noconvergent" attribute on every function call. D85603: IR: Add convergence control operand bundle and intrinsics https://reviews.llvm.org/D85603 The above review defines a "convergent operation" in terms of divergent control flow in multi-threaded executions. It introduces a "convergence token" passed as an operand bundle argument at a call, representing the set of threads that together execute that call. This review has progressed to the point where there don't seem to be any major objections to it, but there is some interest in combining it with the original idea of making all calls convergent by default. Terms Used ========= The following definitions are paraphrased from D85603: Convergent Operation Some parallel execution environments execute threads in groups that allow efficient communication within each group. When control flow diverges, i.e. threads of the same group follow different paths through the CFG, not all threads of the group may be available to participate in this communication. A convergent operation involves inter-thread communication or synchronization that occurs outside of the memory model, where the set of threads which participate in communication is implicitly affected by control flow. Dynamic Instance Every execution of an LLVM IR instruction occurs in a dynamic instance of the instruction. Different executions of the same instruction by a single thread give rise to different dynamic instances of that instruction. Executions of different instructions always occur in different dynamic instances. Executions of the same instruction by different threads may occur in the same dynamic instance. When executing a convergent operation, the set of threads that execute the same dynamic instance is the set of threads that communicate with each other for that operation. Optimization Constraints due to Convergent Calls =============================================== In general, an optimization that modifies control flow in the program must ensure that the set of threads executing each dynamic instance of a convergent call is not affected. By default, every call in LLVM IR is assumed to be convergent. A frontend may further relax this in the following ways: 1. The "noconvergent" attribute may be added to indicate that a call is not sensitive to the set of threads executing any dynamic instance of that call. 2. A "convergencectrl" operand bundle may be passed to the call. The semantics of such a "token", provides fine-grained control over the transforms possible near the callsite. The overall effect is to make the notion of convergence and divergence a universal property of LLVM IR. This provides a "safe default" in the IR semantics, so that frontends and optimizations cannot produce incorrect IR on a GPU target by merely missing an attribute. At the same time, there is no effect on CPU optimizations. An optimization may use divergence analysis along with the above information to determine if a transformation is possible. The only impact on CPU compilation flows is the addition of divergence analysis as a dependency when checking for convergent operations. This analysis is trivial on CPUs where branches do not have divergence and hence all control flow is uniform. Implementation ============= The above proposal will be implemented as follows: 1. Optimizations that check for convergent operations will be updated to depend on divergent analysis. For example, the following change will be made in llvm/lib/Transforms/Scalar/Sink.cpp: Before: bool isSafeToMove(Instruction *Inst) { ... if (auto *Call = dyn_cast<CallBase>(Inst)) { ... if (Call->isConvergent()) return false; ... } } After: bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) { ... // don't sink a convergent call across a divergent branch if (auto *Call = dyn_cast<CallBase>(Inst)) { ... auto Term = Inst->getParent()->getTerminator(); if (Call->isConvergent() && DA.isDivergent(Term)) return false; ... } } 2. D69498 will be updated so that the convergent property is made default, but the new requirements on CPU frontends will be retracted. 3. D85603 will be revised to include the new default convergent property. Thanks, Sameer.
Sameer Sahasrabuddhe via llvm-dev
2021-Jun-02 06:02 UTC
[llvm-dev] RFC: make calls "convergent" by default
CC'ing some more people who got dropped when sending the previous mail. Sameer. Sameer Sahasrabuddhe via llvm-dev writes:> TL;DR > ====> > We propose the following changes to LLVM IR in order to better support > operations that are sensitive to the set of threads that execute them > together: > > - Redefine "convergent" in terms of thread divergence in a > multi-threaded execution. > - Fix all optimizations that examine the "convergent" attribute to also > depend on divergence analysis. This avoids any impact on CPU > compilation since control flow is always uniform on CPUs. > - Make all function calls "convergent" by default (D69498). Introduce a > new "noconvergent" attribute, and make "convergent" a nop. > - Update the "convergence tokens" proposal to take into account this new > default property (D85603). > > Motivation > =========> > This effort is necessary because the current "convergent" attribute is > considered under-defined and sorely needs replacement. > > 1. On GPU targets, the "convergent" attribute is required for > correctness. This is unlike other attributes that are only > used as optimization hints. Missing an attribute should not > result in a miscompilation. > > 2. The current definition of "convergent" attribute does not precisely > represent the constraints on the compiler for a GPU target. The > actual implementation in LLVM sources is far more conservative than > what the definition says. > > 3. Due to the same lack of precision, the attribute cannot properly > represent the side-effects of jump threading on a GPU program. > > Background > =========> > This RFC is a continuation of a discussion split across the following > two reviews. The two reviews compose well to cover all the shortcomings > of the convergent attribute. > > D69498: IR: Invert convergent attribute handling > https://reviews.llvm.org/D69498 > > The above review aims to make all function calls "convergent" by > default, but it received strong opposition due to the requirement that > CPU frontends must now emit a new "noconvergent" attribute on every > function call. > > D85603: IR: Add convergence control operand bundle and intrinsics > https://reviews.llvm.org/D85603 > > The above review defines a "convergent operation" in terms of divergent > control flow in multi-threaded executions. It introduces a "convergence > token" passed as an operand bundle argument at a call, representing the > set of threads that together execute that call. This review has > progressed to the point where there don't seem to be any major > objections to it, but there is some interest in combining it with the > original idea of making all calls convergent by default. > > Terms Used > =========> > The following definitions are paraphrased from D85603: > > Convergent Operation > > Some parallel execution environments execute threads in groups that > allow efficient communication within each group. When control flow > diverges, i.e. threads of the same group follow different paths > through the CFG, not all threads of the group may be available to > participate in this communication. A convergent operation involves > inter-thread communication or synchronization that occurs outside of > the memory model, where the set of threads which participate in > communication is implicitly affected by control flow. > > Dynamic Instance > > Every execution of an LLVM IR instruction occurs in a dynamic instance > of the instruction. Different executions of the same instruction by a > single thread give rise to different dynamic instances of that > instruction. Executions of different instructions always occur in > different dynamic instances. Executions of the same instruction by > different threads may occur in the same dynamic instance. When > executing a convergent operation, the set of threads that execute the > same dynamic instance is the set of threads that communicate with each > other for that operation. > > Optimization Constraints due to Convergent Calls > ===============================================> > In general, an optimization that modifies control flow in the program > must ensure that the set of threads executing each dynamic instance of a > convergent call is not affected. > > By default, every call in LLVM IR is assumed to be convergent. A > frontend may further relax this in the following ways: > > 1. The "noconvergent" attribute may be added to indicate that a call > is not sensitive to the set of threads executing any dynamic > instance of that call. > > 2. A "convergencectrl" operand bundle may be passed to the call. The > semantics of such a "token", provides fine-grained control over the > transforms possible near the callsite. > > The overall effect is to make the notion of convergence and divergence a > universal property of LLVM IR. This provides a "safe default" in the IR > semantics, so that frontends and optimizations cannot produce incorrect > IR on a GPU target by merely missing an attribute. > > At the same time, there is no effect on CPU optimizations. An > optimization may use divergence analysis along with the above > information to determine if a transformation is possible. The only > impact on CPU compilation flows is the addition of divergence analysis > as a dependency when checking for convergent operations. This analysis > is trivial on CPUs where branches do not have divergence and hence all > control flow is uniform. > > Implementation > =============> > The above proposal will be implemented as follows: > > 1. Optimizations that check for convergent operations will be updated to > depend on divergent analysis. For example, the following change will > be made in llvm/lib/Transforms/Scalar/Sink.cpp: > > Before: > > bool isSafeToMove(Instruction *Inst) { > ... > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > if (Call->isConvergent()) > return false; > ... > } > } > > > After: > > bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) { > ... > // don't sink a convergent call across a divergent branch > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > auto Term = Inst->getParent()->getTerminator(); > if (Call->isConvergent() && DA.isDivergent(Term)) > return false; > ... > } > } > > 2. D69498 will be updated so that the convergent property is made > default, but the new requirements on CPU frontends will be retracted. > > 3. D85603 will be revised to include the new default convergent > property. > > Thanks, > Sameer. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev