Sameer Sahasrabuddhe via llvm-dev
2021-Jun-02 06:02 UTC
[llvm-dev] RFC: make calls "convergent" by default
CC'ing some more people who got dropped when sending the previous mail. Sameer. Sameer Sahasrabuddhe via llvm-dev writes:> TL;DR > ====> > We propose the following changes to LLVM IR in order to better support > operations that are sensitive to the set of threads that execute them > together: > > - Redefine "convergent" in terms of thread divergence in a > multi-threaded execution. > - Fix all optimizations that examine the "convergent" attribute to also > depend on divergence analysis. This avoids any impact on CPU > compilation since control flow is always uniform on CPUs. > - Make all function calls "convergent" by default (D69498). Introduce a > new "noconvergent" attribute, and make "convergent" a nop. > - Update the "convergence tokens" proposal to take into account this new > default property (D85603). > > Motivation > =========> > This effort is necessary because the current "convergent" attribute is > considered under-defined and sorely needs replacement. > > 1. On GPU targets, the "convergent" attribute is required for > correctness. This is unlike other attributes that are only > used as optimization hints. Missing an attribute should not > result in a miscompilation. > > 2. The current definition of "convergent" attribute does not precisely > represent the constraints on the compiler for a GPU target. The > actual implementation in LLVM sources is far more conservative than > what the definition says. > > 3. Due to the same lack of precision, the attribute cannot properly > represent the side-effects of jump threading on a GPU program. > > Background > =========> > This RFC is a continuation of a discussion split across the following > two reviews. The two reviews compose well to cover all the shortcomings > of the convergent attribute. > > D69498: IR: Invert convergent attribute handling > https://reviews.llvm.org/D69498 > > The above review aims to make all function calls "convergent" by > default, but it received strong opposition due to the requirement that > CPU frontends must now emit a new "noconvergent" attribute on every > function call. > > D85603: IR: Add convergence control operand bundle and intrinsics > https://reviews.llvm.org/D85603 > > The above review defines a "convergent operation" in terms of divergent > control flow in multi-threaded executions. It introduces a "convergence > token" passed as an operand bundle argument at a call, representing the > set of threads that together execute that call. This review has > progressed to the point where there don't seem to be any major > objections to it, but there is some interest in combining it with the > original idea of making all calls convergent by default. > > Terms Used > =========> > The following definitions are paraphrased from D85603: > > Convergent Operation > > Some parallel execution environments execute threads in groups that > allow efficient communication within each group. When control flow > diverges, i.e. threads of the same group follow different paths > through the CFG, not all threads of the group may be available to > participate in this communication. A convergent operation involves > inter-thread communication or synchronization that occurs outside of > the memory model, where the set of threads which participate in > communication is implicitly affected by control flow. > > Dynamic Instance > > Every execution of an LLVM IR instruction occurs in a dynamic instance > of the instruction. Different executions of the same instruction by a > single thread give rise to different dynamic instances of that > instruction. Executions of different instructions always occur in > different dynamic instances. Executions of the same instruction by > different threads may occur in the same dynamic instance. When > executing a convergent operation, the set of threads that execute the > same dynamic instance is the set of threads that communicate with each > other for that operation. > > Optimization Constraints due to Convergent Calls > ===============================================> > In general, an optimization that modifies control flow in the program > must ensure that the set of threads executing each dynamic instance of a > convergent call is not affected. > > By default, every call in LLVM IR is assumed to be convergent. A > frontend may further relax this in the following ways: > > 1. The "noconvergent" attribute may be added to indicate that a call > is not sensitive to the set of threads executing any dynamic > instance of that call. > > 2. A "convergencectrl" operand bundle may be passed to the call. The > semantics of such a "token", provides fine-grained control over the > transforms possible near the callsite. > > The overall effect is to make the notion of convergence and divergence a > universal property of LLVM IR. This provides a "safe default" in the IR > semantics, so that frontends and optimizations cannot produce incorrect > IR on a GPU target by merely missing an attribute. > > At the same time, there is no effect on CPU optimizations. An > optimization may use divergence analysis along with the above > information to determine if a transformation is possible. The only > impact on CPU compilation flows is the addition of divergence analysis > as a dependency when checking for convergent operations. This analysis > is trivial on CPUs where branches do not have divergence and hence all > control flow is uniform. > > Implementation > =============> > The above proposal will be implemented as follows: > > 1. Optimizations that check for convergent operations will be updated to > depend on divergent analysis. For example, the following change will > be made in llvm/lib/Transforms/Scalar/Sink.cpp: > > Before: > > bool isSafeToMove(Instruction *Inst) { > ... > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > if (Call->isConvergent()) > return false; > ... > } > } > > > After: > > bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) { > ... > // don't sink a convergent call across a divergent branch > if (auto *Call = dyn_cast<CallBase>(Inst)) { > ... > auto Term = Inst->getParent()->getTerminator(); > if (Call->isConvergent() && DA.isDivergent(Term)) > return false; > ... > } > } > > 2. D69498 will be updated so that the convergent property is made > default, but the new requirements on CPU frontends will be retracted. > > 3. D85603 will be revised to include the new default convergent > property. > > Thanks, > Sameer. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
John McCall via llvm-dev
2021-Jun-02 08:06 UTC
[llvm-dev] RFC: make calls "convergent" by default
n 2 Jun 2021, at 2:02, Sameer Sahasrabuddhe wrote:> Sameer Sahasrabuddhe via llvm-dev writes: > >> TL;DR >> ====>> >> We propose the following changes to LLVM IR in order to better support >> operations that are sensitive to the set of threads that execute them >> together: >> >> - Redefine "convergent" in terms of thread divergence in a >> multi-threaded execution. >> - Fix all optimizations that examine the "convergent" attribute to also >> depend on divergence analysis. This avoids any impact on CPU >> compilation since control flow is always uniform on CPUs. >> - Make all function calls "convergent" by default (D69498). Introduce a >> new "noconvergent" attribute, and make "convergent" a nop. >> - Update the "convergence tokens" proposal to take into account this new >> default property (D85603).I would suggest a slightly different way of thinking of this. It’s not really that functions are defaulting to convergence, it’s that they’re defaulting to not participating in the convergence analysis. A function that does participate in the analysis should have a way to mark itself as being convergent. A function that participates and isn’t marked convergent should probably default to being non-convergent, because that’s the conservative assumption (I believe). But if a function doesn’t participate in the analysis at all, well, it just doesn’t apply. At an IR level, there are a couple of different ways to model this. One option is to have two different attributes, e.g. `hasconvergence convergent`. But the second attribute would be meaningless without the first, and clients would have to look up both, which is needlessly inefficient. The other option is to have one attribute with an argument, e.g. `convergent(true)`. Looking up the attribute would give you both pieces of information. GPU targets would presumably require functions (maybe just definitions?) to participate in the convergence analysis. Or maybe they could have different default rules for functions that don’t participate than CPU targets do. Either seems a reasonable choice to me. If the inliner wants to inline non-participating code into participating code, or vice-versa, it either needs to refuse or to mark the resulting function as non-participating. I know this is a little bit more complex than what you’re describing, but I think it’s useful complexity, and I think it’s important to set a good example for how to handle this kind of thing. Non-convergence is a strange property in many ways because of its dependence on the exact code structure rather than simply the code’s ordinary semantics. But if you consider it more abstractly in terms of the shape of the problem, it’s actually a very standard example of an “effect”, and convergence analysis is just another example of an “effect analysis”, which is a large class of analyses with the same basic structure: - There’s some sort of abstract effect. - There are some primitive operations that have the effect. - The effect normally propagates through abstractions: if code calls other code that has the effect, the calling code also has the effect. - The propagation is disjunctive: a code sequence has the effect if any part of the sequence has the effect. - Often it is rare to see the primitive operations explicitly in code, and the analysis is largely about propagation. Sometimes the primitive operations aren’t even modeled in IR at all, and the only source of the effect in the model is that calls to unknown functions have to be treated conservatively. - Sometimes there are ways of preventing propagation; this is usually called “handling” the effect. But a lot of effects don’t have this, and the analysis purely about whether one of the primitive operations is ever performed (directly or indirectly). - Clients are usually trying to prove that code *doesn’t* have the effect, because that gives them more flexibility. - Code has to be assumed to have the effect by default, but if you can prove that a function doesn’t have the effect, you can often propagate that information. The thing is, people are constantly inventing new effect analyses. LLVM has some built-in analyses that are basically effect analyses, like “does this touch global memory” or “does this have any side-effects”. Maybe soon we’ll want to do a new general analysis in LLVM to check whether a function synchronizes with other threads (in the more standard atomics/locks sense, not GPU thread communication). Maybe somebody will add a language-specific analysis to track if a function ever runs “unsafe” code. Maybe somebody will want to do an environment-specific analysis that checks whether a function ever makes an I/O call. Who knows? But they come up a lot, and LLVM doesn’t deal with them very well when it can’t make nice assumptions like “all the code came from the same frontend and is correctly participating in the analysis”. Convergence is important enough for GPUs that maybe it’s worthwhile for all GPU frontends — and so all functions in a module — to participate in it. A lot of these other analyses, well, probably not. And we shouldn’t be totally blocked from doing interprocedural optimization in LLVM just because we’re combining things from different frontends. So my interest here is that I’d like the IR for convergence to set a good example for how to model this kind of effect analysis. I think that starts with acknowledging that maybe not all functions are participating in the analysis and that that’s okay. And I think that lets us more neatly talk about what we want for convergence: either you want to require that all functions in the module participate in the analysis, or you want to recognize non-participating code and treat it more conservatively. Other than that, I don’t much care about the rest of the details; this isn’t my domain, and you all know what you’re trying to do better than I do. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210602/f48b1c46/attachment.html>
Sameer Sahasrabuddhe via llvm-dev
2021-Jun-18 03:02 UTC
[llvm-dev] RFC: make calls "convergent" by default
Sameer Sahasrabuddhe writes:> CC'ing some more people who got dropped when sending the previous mail. > > Sameer. > > Sameer Sahasrabuddhe via llvm-dev writes: > >> TL;DR >> ====>> >> We propose the following changes to LLVM IR in order to better support >> operations that are sensitive to the set of threads that execute them >> together: >> >> - Redefine "convergent" in terms of thread divergence in a >> multi-threaded execution. >> - Fix all optimizations that examine the "convergent" attribute to also >> depend on divergence analysis. This avoids any impact on CPU >> compilation since control flow is always uniform on CPUs. >> - Make all function calls "convergent" by default (D69498). Introduce a >> new "noconvergent" attribute, and make "convergent" a nop. >> - Update the "convergence tokens" proposal to take into account this new >> default property (D85603).Here's an RFC designed to look like an incremental change over Nicolai's original spec for convergence control intrinsics (Phabricator is pretty awesome that way). RFC: Update token semantics with default convergent attribute https://reviews.llvm.org/D104504 This RFC has two parts: LangRef: Define the "convergent" property in LLVM IR and introduce the "noconvergent" attribute. This is independent of convergence control intrinsics and tokens. This part is intended to be submitted first and replaces D69498 (IR: Invert convergent attribute handling) https://reviews.llvm.org/D69498 ConvergentOperations: Updates the semantics of convergence control intrinsics and tokens to account for the new default convergent property. This part is intended to be merged into D85603 (IR: Add convergence control operand bundle and intrinsics) https://reviews.llvm.org/D85603 Sameer.