Sameer Sahasrabuddhe via llvm-dev
2021-Jun-01 11:58 UTC
[llvm-dev] RFC: make calls "convergent" by default
TL;DR
====
We propose the following changes to LLVM IR in order to better support
operations that are sensitive to the set of threads that execute them
together:
- Redefine "convergent" in terms of thread divergence in a
multi-threaded execution.
- Fix all optimizations that examine the "convergent" attribute to
also
depend on divergence analysis. This avoids any impact on CPU
compilation since control flow is always uniform on CPUs.
- Make all function calls "convergent" by default (D69498). Introduce
a
new "noconvergent" attribute, and make "convergent" a nop.
- Update the "convergence tokens" proposal to take into account this
new
default property (D85603).
Motivation
=========
This effort is necessary because the current "convergent" attribute is
considered under-defined and sorely needs replacement.
1. On GPU targets, the "convergent" attribute is required for
correctness. This is unlike other attributes that are only
used as optimization hints. Missing an attribute should not
result in a miscompilation.
2. The current definition of "convergent" attribute does not precisely
represent the constraints on the compiler for a GPU target. The
actual implementation in LLVM sources is far more conservative than
what the definition says.
3. Due to the same lack of precision, the attribute cannot properly
represent the side-effects of jump threading on a GPU program.
Background
=========
This RFC is a continuation of a discussion split across the following
two reviews. The two reviews compose well to cover all the shortcomings
of the convergent attribute.
D69498: IR: Invert convergent attribute handling
https://reviews.llvm.org/D69498
The above review aims to make all function calls "convergent" by
default, but it received strong opposition due to the requirement that
CPU frontends must now emit a new "noconvergent" attribute on every
function call.
D85603: IR: Add convergence control operand bundle and intrinsics
https://reviews.llvm.org/D85603
The above review defines a "convergent operation" in terms of
divergent
control flow in multi-threaded executions. It introduces a "convergence
token" passed as an operand bundle argument at a call, representing the
set of threads that together execute that call. This review has
progressed to the point where there don't seem to be any major
objections to it, but there is some interest in combining it with the
original idea of making all calls convergent by default.
Terms Used
=========
The following definitions are paraphrased from D85603:
Convergent Operation
Some parallel execution environments execute threads in groups that
allow efficient communication within each group. When control flow
diverges, i.e. threads of the same group follow different paths
through the CFG, not all threads of the group may be available to
participate in this communication. A convergent operation involves
inter-thread communication or synchronization that occurs outside of
the memory model, where the set of threads which participate in
communication is implicitly affected by control flow.
Dynamic Instance
Every execution of an LLVM IR instruction occurs in a dynamic instance
of the instruction. Different executions of the same instruction by a
single thread give rise to different dynamic instances of that
instruction. Executions of different instructions always occur in
different dynamic instances. Executions of the same instruction by
different threads may occur in the same dynamic instance. When
executing a convergent operation, the set of threads that execute the
same dynamic instance is the set of threads that communicate with each
other for that operation.
Optimization Constraints due to Convergent Calls
===============================================
In general, an optimization that modifies control flow in the program
must ensure that the set of threads executing each dynamic instance of a
convergent call is not affected.
By default, every call in LLVM IR is assumed to be convergent. A
frontend may further relax this in the following ways:
1. The "noconvergent" attribute may be added to indicate that a call
is not sensitive to the set of threads executing any dynamic
instance of that call.
2. A "convergencectrl" operand bundle may be passed to the call. The
semantics of such a "token", provides fine-grained control over
the
transforms possible near the callsite.
The overall effect is to make the notion of convergence and divergence a
universal property of LLVM IR. This provides a "safe default" in the
IR
semantics, so that frontends and optimizations cannot produce incorrect
IR on a GPU target by merely missing an attribute.
At the same time, there is no effect on CPU optimizations. An
optimization may use divergence analysis along with the above
information to determine if a transformation is possible. The only
impact on CPU compilation flows is the addition of divergence analysis
as a dependency when checking for convergent operations. This analysis
is trivial on CPUs where branches do not have divergence and hence all
control flow is uniform.
Implementation
=============
The above proposal will be implemented as follows:
1. Optimizations that check for convergent operations will be updated to
depend on divergent analysis. For example, the following change will
be made in llvm/lib/Transforms/Scalar/Sink.cpp:
Before:
bool isSafeToMove(Instruction *Inst) {
...
if (auto *Call = dyn_cast<CallBase>(Inst)) {
...
if (Call->isConvergent())
return false;
...
}
}
After:
bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) {
...
// don't sink a convergent call across a divergent branch
if (auto *Call = dyn_cast<CallBase>(Inst)) {
...
auto Term = Inst->getParent()->getTerminator();
if (Call->isConvergent() && DA.isDivergent(Term))
return false;
...
}
}
2. D69498 will be updated so that the convergent property is made
default, but the new requirements on CPU frontends will be retracted.
3. D85603 will be revised to include the new default convergent
property.
Thanks,
Sameer.
Sameer Sahasrabuddhe via llvm-dev
2021-Jun-02 06:02 UTC
[llvm-dev] RFC: make calls "convergent" by default
CC'ing some more people who got dropped when sending the previous mail. Sameer. Sameer Sahasrabuddhe via llvm-dev writes:> TL;DR > ====> > We propose the following changes to LLVM IR in order to better support > operations that are sensitive to the set of threads that execute them > together: > > - Redefine "convergent" in terms of thread divergence in a > multi-threaded execution. > - Fix all optimizations that examine the "convergent" attribute to also > depend on divergence analysis. This avoids any impact on CPU > compilation since control flow is always uniform on CPUs. > - Make all function calls "convergent" by default (D69498). Introduce a > new "noconvergent" attribute, and make "convergent" a nop. > - Update the "convergence tokens" proposal to take into account this new > default property (D85603). > > Motivation > =========> > This effort is necessary because the current "convergent" attribute is > considered under-defined and sorely needs replacement. > > 1. On GPU targets, the "convergent" attribute is required for > correctness. This is unlike other attributes that are only > used as optimization hints. Missing an attribute should not > result in a miscompilation. > > 2. The current definition of "convergent" attribute does not precisely > represent the constraints on the compiler for a GPU target. The > actual implementation in LLVM sources is far more conservative than > what the definition says. > > 3. Due to the same lack of precision, the attribute cannot properly > represent the side-effects of jump threading on a GPU program. > > Background > =========> > This RFC is a continuation of a discussion split across the following > two reviews. The two reviews compose well to cover all the shortcomings > of the convergent attribute. > > D69498: IR: Invert convergent attribute handling > https://reviews.llvm.org/D69498 > > The above review aims to make all function calls "convergent" by > default, but it received strong opposition due to the requirement that > CPU frontends must now emit a new "noconvergent" attribute on every > function call. > > D85603: IR: Add convergence control operand bundle and intrinsics > https://reviews.llvm.org/D85603 > > The above review defines a "convergent operation" in terms of divergent > control flow in multi-threaded executions. It introduces a "convergence > token" passed as an operand bundle argument at a call, representing the > set of threads that together execute that call. This review has > progressed to the point where there don't seem to be any major > objections to it, but there is some interest in combining it with the > original idea of making all calls convergent by default. > > Terms Used > =========> > The following definitions are paraphrased from D85603: > > Convergent Operation > > Some parallel execution environments execute threads in groups that > allow efficient communication within each group. When control flow > diverges, i.e. threads of the same group follow different paths > through the CFG, not all threads of the group may be available to > participate in this communication. A convergent operation involves > inter-thread communication or synchronization that occurs outside of > the memory model, where the set of threads which participate in > communication is implicitly affected by control flow. > > Dynamic Instance > > Every execution of an LLVM IR instruction occurs in a dynamic instance > of the instruction. Different executions of the same instruction by a > single thread give rise to different dynamic instances of that > instruction. Executions of different instructions always occur in > different dynamic instances. Executions of the same instruction by > different threads may occur in the same dynamic instance. When > executing a convergent operation, the set of threads that execute the > same dynamic instance is the set of threads that communicate with each > other for that operation. > > Optimization Constraints due to Convergent Calls > ===============================================> > In general, an optimization that modifies control flow in the program > must ensure that the set of threads executing each dynamic instance of a > convergent call is not affected. > > By default, every call in LLVM IR is assumed to be convergent. A > frontend may further relax this in the following ways: > > 1. The "noconvergent" attribute may be added to indicate that a call > is not sensitive to the set of threads executing any dynamic > instance of that call. > > 2. A "convergencectrl" operand bundle may be passed to the call. The > semantics of such a "token", provides fine-grained control over the > transforms possible near the callsite. > > The overall effect is to make the notion of convergence and divergence a > universal property of LLVM IR. This provides a "safe default" in the IR > semantics, so that frontends and optimizations cannot produce incorrect > IR on a GPU target by merely missing an attribute. > > At the same time, there is no effect on CPU optimizations. An > optimization may use divergence analysis along with the above > information to determine if a transformation is possible. The only > impact on CPU compilation flows is the addition of divergence analysis > as a dependency when checking for convergent operations. This analysis > is trivial on CPUs where branches do not have divergence and hence all > control flow is uniform. > > Implementation > =============> > The above proposal will be implemented as follows: > > 1. Optimizations that check for convergent operations will be updated to > depend on divergent analysis. For example, the following change will > be made in llvm/lib/Transforms/Scalar/Sink.cpp: > > Before: > > bool isSafeToMove(Instruction *Inst) { > ... > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > if (Call->isConvergent()) > return false; > ... > } > } > > > After: > > bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) { > ... > // don't sink a convergent call across a divergent branch > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > auto Term = Inst->getParent()->getTerminator(); > if (Call->isConvergent() && DA.isDivergent(Term)) > return false; > ... > } > } > > 2. D69498 will be updated so that the convergent property is made > default, but the new requirements on CPU frontends will be retracted. > > 3. D85603 will be revised to include the new default convergent > property. > > Thanks, > Sameer. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev