thr3ads.net - llvm dev - [llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Oct-24 23:17 UTC

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

> On Oct 24, 2016, at 4:15 PM, Nicolai Hähnle <nhaehnle at gmail.com>
wrote:
> 
> On 25.10.2016 01:11, Nicolai Hähnle wrote:
>> On 24.10.2016 21:54, Mehdi Amini wrote:
>>>> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> Some brain-storming on an issue with SPMD/SIMT backend support
where
>>>> I think some additional IR attributes would be useful. Sorry
for the
>>>> somewhat long mail; the short version of my current thinking is
that
>>>> I would like to have the following:
>>>> 
>>>> 1) convergent: a call to a function with this attribute cannot
be
>>>> moved to have additional control dependencies; i.e., moving it
from A
>>>> to B is only possible if B dominates or post-dominates A.
>>>> 
>>>> 2) co-convergent (divergent? for lack of a better name...): a
call to
>>>> a function with this attribute cannot be moved to have _fewer_
>>>> control dependencies; i.e., moving it from A to B is only
possible if
>>>> A dominates or post-dominates B.
>>>> 
>>>> 3) uniform (for function arguments): transformations are not
allowed
>>>> to introduce additional non-uniformity in this argument.
>>> 
>>> Can you describe it in terms that are non-SPMD/SIMT?
>>> I.e. I’m not sure that “uniformity” refers to an existing LLVM IR
>>> concept.
>> 
>> Yeah, that's actually the key problem I've been struggling
with.
>> 
>> The first example I sent shows the gist of it. It also shows that the
>> concept can't be expressed in terms of the CFG, which makes this
tricky.
>> 
>> In a way it's the data-flow analog of the convergent attribute: the
>> argument cannot be changed to have additional dependencies in its
>> computation. That captures the basic intention, but I'm not
completely
>> sure that it works.
> 
> One big question is whether there are transformations in LLVM today that
replace a value %a with a value %b, where %b "has additional dependencies
in its computation". I can't think of anything obvious where that would
be the case, but I'm not sure.
What do you mean by “additional dependencies in its computation" here? 

— 
Mehdi


>>>> 
>>>> I'd appreciate input on this proposal, e.g. if this can be
solved in
>>>> an easier way or if there are obvious problems with this
approach.
>>>> 
>>>> Thanks,
>>>> Nicolai
>>>> 
>>>> ..........
>>>> In a nutshell, the problem that this intends to solve is that:
>>>> 
>>>> %v1 = texelFetch(%sampler, %coord0)
>>>> %v2 = texelFetch(%sampler, %coord1)
>>>> %v = select i1 %cond, vType %v1, %v2
>>>> 
>>>> is logically equivalent to and could benefit from being
transformed to,
>>>> 
>>>> %coord = select i1 %cond, cType %coord0, %coord1
>>>> %v = texelFetch(%sampler, %coord)
>>>> 
>>>> but on the other hand
>>>> 
>>>> %v1 = texelFetch(%sampler0, %coord)
>>>> %v2 = texelFetch(%sampler1, %coord)
>>>> %v = select i1 %cond, vType %v1, %v2
>>>> 
>>>> _must_not_ be transformed to
>>>> 
>>>> %s = select i1 %cond, sType %sampler0, %sampler1
>>>> %v = texelFetch(%s, %coord)
>>>> 
>>>> because of uniformity restrictions on the first argument of
texelFetch.
>>>> 
>>>> We currently have a shader that is mis-compiled in the wild
because
>>>> something much like the latter transform is done by
SimplifyCFG.[1]
>>>> There, the equivalent thing happens with phi nodes:
>>>> 
>>>> if:
>>>> %v.if = texelFetch(%sampler0, %coord)
>>>> br label %end
>>>> 
>>>> else:
>>>> %v.else = texelFetch(%sampler1, %coord)
>>>> br label %end
>>>> 
>>>> end:
>>>> %v = phi [ %v.if, %if ], [ %v.else, %else ]
>>>> 
>>>> becomes
>>>> 
>>>> ...
>>>> end:
>>>> %s = phi [ %sampler0, %if ], [ %sampler1, %else ]
>>>> %v = texelFetch(%s, %coord)
>>>> 
>>>> I have collected some more examples at [2]
>>>> 
>>>> The distinctions between all three attribute types mentioned
above
>>>> makes sense, because there are OpenGL shader intrinsics that
>>>> naturally carry almost all of the possible combinations of
those
>>>> attributes.
>>>> 
>>>> That said, a pass that is not SPMD/SIMT-aware must treat every
>>>> function call that has a uniform argument as if it were
>>>> co-convergent, because the input to the uniform argument could
have
>>>> non-uniformity whose structure correlates with control
dependencies;
>>>> see [2] for an example. Furthermore, I have not yet found an
>>>> operation that needs the co-convergent attribute without also
having
>>>> uniform arguments. So for the purposes of LLVM, it may be
sufficient
>>>> to add the 'uniform' attribute for function arguments.
>>>> 
>>>> 
>>>> [1] https://bugs.freedesktop.org/show_bug.cgi?id=97988
>>>> [2]
>>>>
http://nhaehnle.blogspot.de/2016/10/compiling-shaders-dynamically-uniform.html
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>

Nicolai Hähnle via llvm-dev

2016-Oct-25 14:28 UTC

head link

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

On 25.10.2016 01:17, Mehdi Amini wrote:>
>> On Oct 24, 2016, at 4:15 PM, Nicolai Hähnle <nhaehnle at
gmail.com> wrote:
>>
>> On 25.10.2016 01:11, Nicolai Hähnle wrote:
>>> On 24.10.2016 21:54, Mehdi Amini wrote:
>>>>> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev
>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> Some brain-storming on an issue with SPMD/SIMT backend
support where
>>>>> I think some additional IR attributes would be useful.
Sorry for the
>>>>> somewhat long mail; the short version of my current
thinking is that
>>>>> I would like to have the following:
>>>>>
>>>>> 1) convergent: a call to a function with this attribute
cannot be
>>>>> moved to have additional control dependencies; i.e., moving
it from A
>>>>> to B is only possible if B dominates or post-dominates A.
>>>>>
>>>>> 2) co-convergent (divergent? for lack of a better name...):
a call to
>>>>> a function with this attribute cannot be moved to have
_fewer_
>>>>> control dependencies; i.e., moving it from A to B is only
possible if
>>>>> A dominates or post-dominates B.
>>>>>
>>>>> 3) uniform (for function arguments): transformations are
not allowed
>>>>> to introduce additional non-uniformity in this argument.
>>>>
>>>> Can you describe it in terms that are non-SPMD/SIMT?
>>>> I.e. I’m not sure that “uniformity” refers to an existing LLVM
IR
>>>> concept.
>>>
>>> Yeah, that's actually the key problem I've been struggling
with.
>>>
>>> The first example I sent shows the gist of it. It also shows that
the
>>> concept can't be expressed in terms of the CFG, which makes
this tricky.
>>>
>>> In a way it's the data-flow analog of the convergent attribute:
the
>>> argument cannot be changed to have additional dependencies in its
>>> computation. That captures the basic intention, but I'm not
completely
>>> sure that it works.
>>
>> One big question is whether there are transformations in LLVM today
that replace a value %a with a value %b, where %b "has additional
dependencies in its computation". I can't think of anything obvious
where that would be the case, but I'm not sure.
>
> What do you mean by “additional dependencies in its computation" here?
Minus phi-nodes, loads, and intrinsic calls, it could be: look at the 
directed graph where %a -> %b if %a is an operand of the instruction 
that defines %b. Take all nodes of that graph that do not have predecessors.

But I fear that this path leads to eternal fuzziness. Let me try a 
completely different approach to define what we need by augmenting the 
semantics of IR with "divergence tokens". In addition to its usual 
value, every IR value carries a "divergence set" of divergence tokens.

The basic rule is: the divergence set of a value is (at least) the union 
of the divergence sets of its operands.

Every function input carries a unique divergence token.

For phi-nodes, the divergence set also includes the branch conditions 
that can affect which predecessor value is taken. (A simpler, more 
conservative rule, would be to add a unique divergence token; this can 
be per-basic block.)

Loads add a unique divergence token. (This is a very conservative rule 
mostly required for per-thread-allocas. It could be relaxed quite a bit 
by treating per-thread-memory like SSA-values augmented with divergence 
sets.)

Atomic operations add a unique divergence token.

Additional target-specific intrinsics may also add a divergence token.

By transitivity, most function calls have to add a unique divergence token.

The first restriction on the transformation of a call to a function with 
'uniform' function argument is then:

1. The corresponding argument of the call can be changed only if the 
change preserves (or shrinks) the divergence set (on top of that, the 
new value obviously has to be equal to the old one in single threaded 
semantics).

With this definition,

def @fn(s0, s1, cond) {
   v0 = texelFetch(s0, ...)
   v1 = texelFetch(s1, ...)
   v = select i1 cond, v0, v1
}

cannot be transformed to

def @fn(s0, s1, cond) {
   s = select i1 cond, s0, s1
   v = texelFetch(s, ...)
}

because s has a larger divergence set.

On the other hand, at least the InstCombine transforms should all be 
safe because they locally transform a DAG of values in a way that can 
only shrink the set of predecessors of the locally transformed region of 
the DAG.

Transforms that modify the CFG could be problematic especially with the 
simpler phi-node-rule, e.g. when a new basic block is created, you end 
up introducing new divergence tokens for "downstream" values.
That's
probably not a problem with the first definition for phi-nodes, though 
I'm not sure.

.....
The above definition still allows the transformation of

   masked &= 1
   if (masked == 1) {
     v1 = texelFetch(array[masked], ...)
   } else {
     v0 = texelFetch(array[masked], ...)
   }
   v = phi(v1, v0)

to

   masked &= 1
   v = texelFetch(array[masked], ...);

which is incorrect. So for a set of divergence tokens D, let's say A 
D-dominates B if the following holds: there exists a way of fixing the 
branching directions of conditional branches whose condition's 
divergence set is disjoint from D, such that A dominates B in the 
resulting CFG. Similarly for D-post-domination.

If D \subset D', then D'-domination implies D-domination.

We can then formulate the second condition as:

2a. A function call with a uniform argument with divergence set D can be 
moved from location A to location B if A D-dominates or D-post-dominates B.

2b. Simpler but more conservatively, just treat the function call as 
co-convergent (i.e., it can only be moved to positions that are 
dominated or post-dominated by its current position in the usual sense.)

.....
The full definition of D-domination is probably too involved to be 
turned into practical algorithms. But with this model of divergence 
sets, it should be possible to think of DivergenceAnalysis as an 
analysis that proves divergence sets of values to be empty (using 
additional target specific information to be less conservative). 
Combined with DivergenceAnalysis, (1)+(2a) still leave room for some 
useful and practical optimizations of functions with uniform arguments 
even across basic blocks.

In practical terms for a first version, the effect of the attribute 
should just be to prevent the two transformations shown above.

Thanks,
Nicolai

Nicolai Hähnle via llvm-dev

2016-Oct-26 14:54 UTC

head link

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

On 25.10.2016 16:28, Nicolai Hähnle wrote:> But I fear that this path leads to eternal fuzziness. Let me try a
> completely different approach to define what we need by augmenting the
> semantics of IR with "divergence tokens". In addition to its
usual
> value, every IR value carries a "divergence set" of divergence
tokens.
>
> The basic rule is: the divergence set of a value is (at least) the union
> of the divergence sets of its operands.
>
> Every function input carries a unique divergence token.
>
> For phi-nodes, the divergence set also includes the branch conditions
> that can affect which predecessor value is taken. (A simpler, more
> conservative rule, would be to add a unique divergence token; this can
> be per-basic block.)
>
> Loads add a unique divergence token. (This is a very conservative rule
> mostly required for per-thread-allocas. It could be relaxed quite a bit
> by treating per-thread-memory like SSA-values augmented with divergence
> sets.)
>
> Atomic operations add a unique divergence token.
>
> Additional target-specific intrinsics may also add a divergence token.
>
> By transitivity, most function calls have to add a unique divergence token.
>
> The first restriction on the transformation of a call to a function with
> 'uniform' function argument is then:
>
> 1. The corresponding argument of the call can be changed only if the
> change preserves (or shrinks) the divergence set (on top of that, the
> new value obviously has to be equal to the old one in single threaded
> semantics).
Turns out that this isn't actually sufficient. Consider this (admittedly 
convoluted) example:

    idx = input & 1
    v0 = texelFetch(s[idx & 0], ...)
    v1 = texelFetch(s[idx | 1], ...)
    cond = trunc idx to i1
    v = select i1 cond, v1, v0

transformed into

    idx = in & 1
    v = texelFetch(s[idx], ...)

has the same divergence set for the argument to texelFetch as defined in 
the quote above, but the transformation is incorrect.

So it's pretty clear what we want: We want a way to flag certain 
function arguments in a way that forbids transformations of the form 
select(call ... , call ...) -> call (select ..., ...), ...

But it would be nice to have a clear definition of _why_ those 
transformations must be forbidden. It's not clear how to do that without 
pulling in a full model of SIMT-style parallel execution, and admittedly 
I don't think we have a sane model for _that_ in the first place :-(

Something that at least partially addresses the SIMT-style semantics: 
For every pair (initial state, function inputs) and every call site of 
the relevant function, keep a log of function arguments with which the 
function has been called.

The restriction on program transformations is roughly: two pairs A and B 
of (initial state, function inputs) are compatible when run on the 
original program if at each call site the log of A is a subsequence of 
the log of B or vice versa. Then every pair A,B that is compatible when 
run on the original program must also be compatible on the transformed 
program.

This isn't a great definition either, but it gets closer to the actual 
SIMT-semantics. Any additional ideas would be very much appreciated.

Thanks,
Nicolai
>
> With this definition,
>
> def @fn(s0, s1, cond) {
>   v0 = texelFetch(s0, ...)
>   v1 = texelFetch(s1, ...)
>   v = select i1 cond, v0, v1
> }
>
> cannot be transformed to
>
> def @fn(s0, s1, cond) {
>   s = select i1 cond, s0, s1
>   v = texelFetch(s, ...)
> }
>
> because s has a larger divergence set.
>
> On the other hand, at least the InstCombine transforms should all be
> safe because they locally transform a DAG of values in a way that can
> only shrink the set of predecessors of the locally transformed region of
> the DAG.
>
> Transforms that modify the CFG could be problematic especially with the
> simpler phi-node-rule, e.g. when a new basic block is created, you end
> up introducing new divergence tokens for "downstream" values.
That's
> probably not a problem with the first definition for phi-nodes, though
> I'm not sure.
>
> .....
> The above definition still allows the transformation of
>
>   masked &= 1
>   if (masked == 1) {
>     v1 = texelFetch(array[masked], ...)
>   } else {
>     v0 = texelFetch(array[masked], ...)
>   }
>   v = phi(v1, v0)
>
> to
>
>   masked &= 1
>   v = texelFetch(array[masked], ...);
>
> which is incorrect. So for a set of divergence tokens D, let's say A
> D-dominates B if the following holds: there exists a way of fixing the
> branching directions of conditional branches whose condition's
> divergence set is disjoint from D, such that A dominates B in the
> resulting CFG. Similarly for D-post-domination.
>
> If D \subset D', then D'-domination implies D-domination.
>
> We can then formulate the second condition as:
>
> 2a. A function call with a uniform argument with divergence set D can be
> moved from location A to location B if A D-dominates or D-post-dominates B.
>
> 2b. Simpler but more conservatively, just treat the function call as
> co-convergent (i.e., it can only be moved to positions that are
> dominated or post-dominated by its current position in the usual sense.)
>
> .....
> The full definition of D-domination is probably too involved to be
> turned into practical algorithms. But with this model of divergence
> sets, it should be possible to think of DivergenceAnalysis as an
> analysis that proves divergence sets of values to be empty (using
> additional target specific information to be less conservative).
> Combined with DivergenceAnalysis, (1)+(2a) still leave room for some
> useful and practical optimizations of functions with uniform arguments
> even across basic blocks.
>
> In practical terms for a first version, the effect of the attribute
> should just be to prevent the two transformations shown above.
>
> Thanks,
> Nicolai

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Oct 2016 - RFC: (Co-)Convergent functions and uniform function parameters

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

Apparently Analagous Threads