Nicolai Hähnle via llvm-dev
2016-Oct-26 14:54 UTC
[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters
On 25.10.2016 16:28, Nicolai Hähnle wrote:> But I fear that this path leads to eternal fuzziness. Let me try a > completely different approach to define what we need by augmenting the > semantics of IR with "divergence tokens". In addition to its usual > value, every IR value carries a "divergence set" of divergence tokens. > > The basic rule is: the divergence set of a value is (at least) the union > of the divergence sets of its operands. > > Every function input carries a unique divergence token. > > For phi-nodes, the divergence set also includes the branch conditions > that can affect which predecessor value is taken. (A simpler, more > conservative rule, would be to add a unique divergence token; this can > be per-basic block.) > > Loads add a unique divergence token. (This is a very conservative rule > mostly required for per-thread-allocas. It could be relaxed quite a bit > by treating per-thread-memory like SSA-values augmented with divergence > sets.) > > Atomic operations add a unique divergence token. > > Additional target-specific intrinsics may also add a divergence token. > > By transitivity, most function calls have to add a unique divergence token. > > The first restriction on the transformation of a call to a function with > 'uniform' function argument is then: > > 1. The corresponding argument of the call can be changed only if the > change preserves (or shrinks) the divergence set (on top of that, the > new value obviously has to be equal to the old one in single threaded > semantics).Turns out that this isn't actually sufficient. Consider this (admittedly convoluted) example: idx = input & 1 v0 = texelFetch(s[idx & 0], ...) v1 = texelFetch(s[idx | 1], ...) cond = trunc idx to i1 v = select i1 cond, v1, v0 transformed into idx = in & 1 v = texelFetch(s[idx], ...) has the same divergence set for the argument to texelFetch as defined in the quote above, but the transformation is incorrect. So it's pretty clear what we want: We want a way to flag certain function arguments in a way that forbids transformations of the form select(call ... , call ...) -> call (select ..., ...), ... But it would be nice to have a clear definition of _why_ those transformations must be forbidden. It's not clear how to do that without pulling in a full model of SIMT-style parallel execution, and admittedly I don't think we have a sane model for _that_ in the first place :-( Something that at least partially addresses the SIMT-style semantics: For every pair (initial state, function inputs) and every call site of the relevant function, keep a log of function arguments with which the function has been called. The restriction on program transformations is roughly: two pairs A and B of (initial state, function inputs) are compatible when run on the original program if at each call site the log of A is a subsequence of the log of B or vice versa. Then every pair A,B that is compatible when run on the original program must also be compatible on the transformed program. This isn't a great definition either, but it gets closer to the actual SIMT-semantics. Any additional ideas would be very much appreciated. Thanks, Nicolai> > With this definition, > > def @fn(s0, s1, cond) { > v0 = texelFetch(s0, ...) > v1 = texelFetch(s1, ...) > v = select i1 cond, v0, v1 > } > > cannot be transformed to > > def @fn(s0, s1, cond) { > s = select i1 cond, s0, s1 > v = texelFetch(s, ...) > } > > because s has a larger divergence set. > > On the other hand, at least the InstCombine transforms should all be > safe because they locally transform a DAG of values in a way that can > only shrink the set of predecessors of the locally transformed region of > the DAG. > > Transforms that modify the CFG could be problematic especially with the > simpler phi-node-rule, e.g. when a new basic block is created, you end > up introducing new divergence tokens for "downstream" values. That's > probably not a problem with the first definition for phi-nodes, though > I'm not sure. > > ..... > The above definition still allows the transformation of > > masked &= 1 > if (masked == 1) { > v1 = texelFetch(array[masked], ...) > } else { > v0 = texelFetch(array[masked], ...) > } > v = phi(v1, v0) > > to > > masked &= 1 > v = texelFetch(array[masked], ...); > > which is incorrect. So for a set of divergence tokens D, let's say A > D-dominates B if the following holds: there exists a way of fixing the > branching directions of conditional branches whose condition's > divergence set is disjoint from D, such that A dominates B in the > resulting CFG. Similarly for D-post-domination. > > If D \subset D', then D'-domination implies D-domination. > > We can then formulate the second condition as: > > 2a. A function call with a uniform argument with divergence set D can be > moved from location A to location B if A D-dominates or D-post-dominates B. > > 2b. Simpler but more conservatively, just treat the function call as > co-convergent (i.e., it can only be moved to positions that are > dominated or post-dominated by its current position in the usual sense.) > > ..... > The full definition of D-domination is probably too involved to be > turned into practical algorithms. But with this model of divergence > sets, it should be possible to think of DivergenceAnalysis as an > analysis that proves divergence sets of values to be empty (using > additional target specific information to be less conservative). > Combined with DivergenceAnalysis, (1)+(2a) still leave room for some > useful and practical optimizations of functions with uniform arguments > even across basic blocks. > > In practical terms for a first version, the effect of the attribute > should just be to prevent the two transformations shown above. > > Thanks, > Nicolai
Justin Lebar via llvm-dev
2016-Oct-31 18:37 UTC
[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters
(I work on CUDA / PTX.) For one thing I'm in favor of having fewer annotations rather than more, so if we can do this in a reasonable way without introducing the notion of co-convergent calls, I think that would be a win. The one convergent annotation is difficult enough for the GPU folks to grok and then keep in cache, and everyone who works on llvm has to pay the cost of keeping their passes compliant with these annotations. I will think on a way to formalize this "uniform" attribute -- I don't have any good ideas at the moment. Are you going to be at the llvm dev meeting this week? We might be able to make progress brainstorming this in person. -Justin On Wed, Oct 26, 2016 at 7:54 AM, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On 25.10.2016 16:28, Nicolai Hähnle wrote: >> >> But I fear that this path leads to eternal fuzziness. Let me try a >> completely different approach to define what we need by augmenting the >> semantics of IR with "divergence tokens". In addition to its usual >> value, every IR value carries a "divergence set" of divergence tokens. >> >> The basic rule is: the divergence set of a value is (at least) the union >> of the divergence sets of its operands. >> >> Every function input carries a unique divergence token. >> >> For phi-nodes, the divergence set also includes the branch conditions >> that can affect which predecessor value is taken. (A simpler, more >> conservative rule, would be to add a unique divergence token; this can >> be per-basic block.) >> >> Loads add a unique divergence token. (This is a very conservative rule >> mostly required for per-thread-allocas. It could be relaxed quite a bit >> by treating per-thread-memory like SSA-values augmented with divergence >> sets.) >> >> Atomic operations add a unique divergence token. >> >> Additional target-specific intrinsics may also add a divergence token. >> >> By transitivity, most function calls have to add a unique divergence >> token. >> >> The first restriction on the transformation of a call to a function with >> 'uniform' function argument is then: >> >> 1. The corresponding argument of the call can be changed only if the >> change preserves (or shrinks) the divergence set (on top of that, the >> new value obviously has to be equal to the old one in single threaded >> semantics). > > > Turns out that this isn't actually sufficient. Consider this (admittedly > convoluted) example: > > idx = input & 1 > v0 = texelFetch(s[idx & 0], ...) > v1 = texelFetch(s[idx | 1], ...) > cond = trunc idx to i1 > v = select i1 cond, v1, v0 > > transformed into > > idx = in & 1 > v = texelFetch(s[idx], ...) > > has the same divergence set for the argument to texelFetch as defined in the > quote above, but the transformation is incorrect. > > So it's pretty clear what we want: We want a way to flag certain function > arguments in a way that forbids transformations of the form select(call ... > , call ...) -> call (select ..., ...), ... > > But it would be nice to have a clear definition of _why_ those > transformations must be forbidden. It's not clear how to do that without > pulling in a full model of SIMT-style parallel execution, and admittedly I > don't think we have a sane model for _that_ in the first place :-( > > Something that at least partially addresses the SIMT-style semantics: For > every pair (initial state, function inputs) and every call site of the > relevant function, keep a log of function arguments with which the function > has been called. > > The restriction on program transformations is roughly: two pairs A and B of > (initial state, function inputs) are compatible when run on the original > program if at each call site the log of A is a subsequence of the log of B > or vice versa. Then every pair A,B that is compatible when run on the > original program must also be compatible on the transformed program. > > This isn't a great definition either, but it gets closer to the actual > SIMT-semantics. Any additional ideas would be very much appreciated. > > Thanks, > Nicolai > > >> >> With this definition, >> >> def @fn(s0, s1, cond) { >> v0 = texelFetch(s0, ...) >> v1 = texelFetch(s1, ...) >> v = select i1 cond, v0, v1 >> } >> >> cannot be transformed to >> >> def @fn(s0, s1, cond) { >> s = select i1 cond, s0, s1 >> v = texelFetch(s, ...) >> } >> >> because s has a larger divergence set. >> >> On the other hand, at least the InstCombine transforms should all be >> safe because they locally transform a DAG of values in a way that can >> only shrink the set of predecessors of the locally transformed region of >> the DAG. >> >> Transforms that modify the CFG could be problematic especially with the >> simpler phi-node-rule, e.g. when a new basic block is created, you end >> up introducing new divergence tokens for "downstream" values. That's >> probably not a problem with the first definition for phi-nodes, though >> I'm not sure. >> >> ..... >> The above definition still allows the transformation of >> >> masked &= 1 >> if (masked == 1) { >> v1 = texelFetch(array[masked], ...) >> } else { >> v0 = texelFetch(array[masked], ...) >> } >> v = phi(v1, v0) >> >> to >> >> masked &= 1 >> v = texelFetch(array[masked], ...); >> >> which is incorrect. So for a set of divergence tokens D, let's say A >> D-dominates B if the following holds: there exists a way of fixing the >> branching directions of conditional branches whose condition's >> divergence set is disjoint from D, such that A dominates B in the >> resulting CFG. Similarly for D-post-domination. >> >> If D \subset D', then D'-domination implies D-domination. >> >> We can then formulate the second condition as: >> >> 2a. A function call with a uniform argument with divergence set D can be >> moved from location A to location B if A D-dominates or D-post-dominates >> B. >> >> 2b. Simpler but more conservatively, just treat the function call as >> co-convergent (i.e., it can only be moved to positions that are >> dominated or post-dominated by its current position in the usual sense.) >> >> ..... >> The full definition of D-domination is probably too involved to be >> turned into practical algorithms. But with this model of divergence >> sets, it should be possible to think of DivergenceAnalysis as an >> analysis that proves divergence sets of values to be empty (using >> additional target specific information to be less conservative). >> Combined with DivergenceAnalysis, (1)+(2a) still leave room for some >> useful and practical optimizations of functions with uniform arguments >> even across basic blocks. >> >> In practical terms for a first version, the effect of the attribute >> should just be to prevent the two transformations shown above. >> >> Thanks, >> Nicolai > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Nicolai Hähnle via llvm-dev
2016-Oct-31 20:49 UTC
[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters
On 31.10.2016 19:37, Justin Lebar wrote:> (I work on CUDA / PTX.) > > For one thing I'm in favor of having fewer annotations rather than > more, so if we can do this in a reasonable way without introducing the > notion of co-convergent calls, I think that would be a win. The one > convergent annotation is difficult enough for the GPU folks to grok > and then keep in cache, and everyone who works on llvm has to pay the > cost of keeping their passes compliant with these annotations.I know. I'd love to find a simpler way to express this constraint, so any ideas are welcome :)> I will think on a way to formalize this "uniform" attribute -- I don't > have any good ideas at the moment. Are you going to be at the llvm > dev meeting this week? We might be able to make progress > brainstorming this in person.Unfortunately I won't be there. Nicolai> > -Justin > > On Wed, Oct 26, 2016 at 7:54 AM, Nicolai Hähnle via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> On 25.10.2016 16:28, Nicolai Hähnle wrote: >>> >>> But I fear that this path leads to eternal fuzziness. Let me try a >>> completely different approach to define what we need by augmenting the >>> semantics of IR with "divergence tokens". In addition to its usual >>> value, every IR value carries a "divergence set" of divergence tokens. >>> >>> The basic rule is: the divergence set of a value is (at least) the union >>> of the divergence sets of its operands. >>> >>> Every function input carries a unique divergence token. >>> >>> For phi-nodes, the divergence set also includes the branch conditions >>> that can affect which predecessor value is taken. (A simpler, more >>> conservative rule, would be to add a unique divergence token; this can >>> be per-basic block.) >>> >>> Loads add a unique divergence token. (This is a very conservative rule >>> mostly required for per-thread-allocas. It could be relaxed quite a bit >>> by treating per-thread-memory like SSA-values augmented with divergence >>> sets.) >>> >>> Atomic operations add a unique divergence token. >>> >>> Additional target-specific intrinsics may also add a divergence token. >>> >>> By transitivity, most function calls have to add a unique divergence >>> token. >>> >>> The first restriction on the transformation of a call to a function with >>> 'uniform' function argument is then: >>> >>> 1. The corresponding argument of the call can be changed only if the >>> change preserves (or shrinks) the divergence set (on top of that, the >>> new value obviously has to be equal to the old one in single threaded >>> semantics). >> >> >> Turns out that this isn't actually sufficient. Consider this (admittedly >> convoluted) example: >> >> idx = input & 1 >> v0 = texelFetch(s[idx & 0], ...) >> v1 = texelFetch(s[idx | 1], ...) >> cond = trunc idx to i1 >> v = select i1 cond, v1, v0 >> >> transformed into >> >> idx = in & 1 >> v = texelFetch(s[idx], ...) >> >> has the same divergence set for the argument to texelFetch as defined in the >> quote above, but the transformation is incorrect. >> >> So it's pretty clear what we want: We want a way to flag certain function >> arguments in a way that forbids transformations of the form select(call ... >> , call ...) -> call (select ..., ...), ... >> >> But it would be nice to have a clear definition of _why_ those >> transformations must be forbidden. It's not clear how to do that without >> pulling in a full model of SIMT-style parallel execution, and admittedly I >> don't think we have a sane model for _that_ in the first place :-( >> >> Something that at least partially addresses the SIMT-style semantics: For >> every pair (initial state, function inputs) and every call site of the >> relevant function, keep a log of function arguments with which the function >> has been called. >> >> The restriction on program transformations is roughly: two pairs A and B of >> (initial state, function inputs) are compatible when run on the original >> program if at each call site the log of A is a subsequence of the log of B >> or vice versa. Then every pair A,B that is compatible when run on the >> original program must also be compatible on the transformed program. >> >> This isn't a great definition either, but it gets closer to the actual >> SIMT-semantics. Any additional ideas would be very much appreciated. >> >> Thanks, >> Nicolai >> >> >>> >>> With this definition, >>> >>> def @fn(s0, s1, cond) { >>> v0 = texelFetch(s0, ...) >>> v1 = texelFetch(s1, ...) >>> v = select i1 cond, v0, v1 >>> } >>> >>> cannot be transformed to >>> >>> def @fn(s0, s1, cond) { >>> s = select i1 cond, s0, s1 >>> v = texelFetch(s, ...) >>> } >>> >>> because s has a larger divergence set. >>> >>> On the other hand, at least the InstCombine transforms should all be >>> safe because they locally transform a DAG of values in a way that can >>> only shrink the set of predecessors of the locally transformed region of >>> the DAG. >>> >>> Transforms that modify the CFG could be problematic especially with the >>> simpler phi-node-rule, e.g. when a new basic block is created, you end >>> up introducing new divergence tokens for "downstream" values. That's >>> probably not a problem with the first definition for phi-nodes, though >>> I'm not sure. >>> >>> ..... >>> The above definition still allows the transformation of >>> >>> masked &= 1 >>> if (masked == 1) { >>> v1 = texelFetch(array[masked], ...) >>> } else { >>> v0 = texelFetch(array[masked], ...) >>> } >>> v = phi(v1, v0) >>> >>> to >>> >>> masked &= 1 >>> v = texelFetch(array[masked], ...); >>> >>> which is incorrect. So for a set of divergence tokens D, let's say A >>> D-dominates B if the following holds: there exists a way of fixing the >>> branching directions of conditional branches whose condition's >>> divergence set is disjoint from D, such that A dominates B in the >>> resulting CFG. Similarly for D-post-domination. >>> >>> If D \subset D', then D'-domination implies D-domination. >>> >>> We can then formulate the second condition as: >>> >>> 2a. A function call with a uniform argument with divergence set D can be >>> moved from location A to location B if A D-dominates or D-post-dominates >>> B. >>> >>> 2b. Simpler but more conservatively, just treat the function call as >>> co-convergent (i.e., it can only be moved to positions that are >>> dominated or post-dominated by its current position in the usual sense.) >>> >>> ..... >>> The full definition of D-domination is probably too involved to be >>> turned into practical algorithms. But with this model of divergence >>> sets, it should be possible to think of DivergenceAnalysis as an >>> analysis that proves divergence sets of values to be empty (using >>> additional target specific information to be less conservative). >>> Combined with DivergenceAnalysis, (1)+(2a) still leave room for some >>> useful and practical optimizations of functions with uniform arguments >>> even across basic blocks. >>> >>> In practical terms for a first version, the effect of the attribute >>> should just be to prevent the two transformations shown above. >>> >>> Thanks, >>> Nicolai >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Nicolai Hähnle via llvm-dev
2016-Nov-07 10:30 UTC
[llvm-dev] RFC: Convergent as an attribute for function arguments
On 31.10.2016 19:37, Justin Lebar wrote:> For one thing I'm in favor of having fewer annotations rather than > more, so if we can do this in a reasonable way without introducing the > notion of co-convergent calls, I think that would be a win. The one > convergent annotation is difficult enough for the GPU folks to grok > and then keep in cache, and everyone who works on llvm has to pay the > cost of keeping their passes compliant with these annotations.I now put an RFC implementation at https://reviews.llvm.org/D26348. The commit message contains a brief recap of what this is for. The LangRef language there has a proposal to help reduce the cognitive burden: if you think of the convergent attribute for functions as saying that "whether/when the function is called must be uniform across threads", then the convergent attribute for function _arguments_ is saying that "the actual argument that the function is called with must be uniform across threads". The consequences for how this restricts transforms are different, but the mental model is the same and I think effectively correct for both meanings. (As a minor bonus, the code delta is smaller if the same term is used for both ;-)). Please correct me if I'm missing something! The one thing that I'm not quite happy with is that there's no clean formalization of what the restriction really is. There's an okay-ish attempt in the LangRef diff of the patch, but to be honest I'd rather just leave that out if nothing better can be found. The difficulty I'm grappling with is that this involves both data flow and control flow, and I don't know how to express the combination properly without importing the whole semantics of a SIMT/SPMD-style execution model. Thanks, Nicolai
Reasonably Related Threads
- RFC: (Co-)Convergent functions and uniform function parameters
- RFC: (Co-)Convergent functions and uniform function parameters
- RFC: (Co-)Convergent functions and uniform function parameters
- RFC: (Co-)Convergent functions and uniform function parameters
- [LLVMdev] RFC: Convergent attribute