thr3ads.net - llvm dev - [llvm-dev] RFC: make calls "convergent" by default [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Sameer Sahasrabuddhe via llvm-dev

2021-Jun-01 11:58 UTC

[llvm-dev] RFC: make calls "convergent" by default

TL;DR
====
We propose the following changes to LLVM IR in order to better support
operations that are sensitive to the set of threads that execute them
together:

- Redefine "convergent" in terms of thread divergence in a
  multi-threaded execution.
- Fix all optimizations that examine the "convergent" attribute to
also
  depend on divergence analysis. This avoids any impact on CPU
  compilation since control flow is always uniform on CPUs.
- Make all function calls "convergent" by default (D69498). Introduce
a
  new "noconvergent" attribute, and make "convergent" a nop.
- Update the "convergence tokens" proposal to take into account this
new
  default property (D85603).

Motivation
=========
This effort is necessary because the current "convergent" attribute is
considered under-defined and sorely needs replacement.

1. On GPU targets, the "convergent" attribute is required for
   correctness. This is unlike other attributes that are only
   used as optimization hints. Missing an attribute should not
   result in a miscompilation.

2. The current definition of "convergent" attribute does not precisely
   represent the constraints on the compiler for a GPU target. The
   actual implementation in LLVM sources is far more conservative than
   what the definition says.

3. Due to the same lack of precision, the attribute cannot properly
   represent the side-effects of jump threading on a GPU program.

Background
=========
This RFC is a continuation of a discussion split across the following
two reviews. The two reviews compose well to cover all the shortcomings
of the convergent attribute.

  D69498: IR: Invert convergent attribute handling
  https://reviews.llvm.org/D69498

The above review aims to make all function calls "convergent" by
default, but it received strong opposition due to the requirement that
CPU frontends must now emit a new "noconvergent" attribute on every
function call.

  D85603: IR: Add convergence control operand bundle and intrinsics
  https://reviews.llvm.org/D85603

The above review defines a "convergent operation" in terms of
divergent
control flow in multi-threaded executions. It introduces a "convergence
token" passed as an operand bundle argument at a call, representing the
set of threads that together execute that call. This review has
progressed to the point where there don't seem to be any major
objections to it, but there is some interest in combining it with the
original idea of making all calls convergent by default.

Terms Used
=========
The following definitions are paraphrased from D85603:

Convergent Operation

  Some parallel execution environments execute threads in groups that
  allow efficient communication within each group. When control flow
  diverges, i.e. threads of the same group follow different paths
  through the CFG, not all threads of the group may be available to
  participate in this communication. A convergent operation involves
  inter-thread communication or synchronization that occurs outside of
  the memory model, where the set of threads which participate in
  communication is implicitly affected by control flow.

Dynamic Instance

  Every execution of an LLVM IR instruction occurs in a dynamic instance
  of the instruction. Different executions of the same instruction by a
  single thread give rise to different dynamic instances of that
  instruction. Executions of different instructions always occur in
  different dynamic instances. Executions of the same instruction by
  different threads may occur in the same dynamic instance. When
  executing a convergent operation, the set of threads that execute the
  same dynamic instance is the set of threads that communicate with each
  other for that operation.

Optimization Constraints due to Convergent Calls
===============================================
In general, an optimization that modifies control flow in the program
must ensure that the set of threads executing each dynamic instance of a
convergent call is not affected.

By default, every call in LLVM IR is assumed to be convergent. A
frontend may further relax this in the following ways:

  1. The "noconvergent" attribute may be added to indicate that a call
     is not sensitive to the set of threads executing any dynamic
     instance of that call.

  2. A "convergencectrl" operand bundle may be passed to the call. The
     semantics of such a "token", provides fine-grained control over
the
     transforms possible near the callsite.

The overall effect is to make the notion of convergence and divergence a
universal property of LLVM IR. This provides a "safe default" in the
IR
semantics, so that frontends and optimizations cannot produce incorrect
IR on a GPU target by merely missing an attribute.

At the same time, there is no effect on CPU optimizations. An
optimization may use divergence analysis along with the above
information to determine if a transformation is possible. The only
impact on CPU compilation flows is the addition of divergence analysis
as a dependency when checking for convergent operations. This analysis
is trivial on CPUs where branches do not have divergence and hence all
control flow is uniform.

Implementation
=============
The above proposal will be implemented as follows:

1. Optimizations that check for convergent operations will be updated to
   depend on divergent analysis. For example, the following change will
   be made in llvm/lib/Transforms/Scalar/Sink.cpp:

   Before:

     bool isSafeToMove(Instruction *Inst) {
         ...
         if (auto *Call = dyn_cast<CallBase>(Inst)) {
             ...
             if (Call->isConvergent())
                 return false;
             ...
         }
     }


   After:

     bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...) {
         ...
         // don't sink a convergent call across a divergent branch
         if (auto *Call = dyn_cast<CallBase>(Inst)) {
             ...
             auto Term = Inst->getParent()->getTerminator();
             if (Call->isConvergent() && DA.isDivergent(Term))
                 return false;
             ...
         }
     }

2. D69498 will be updated so that the convergent property is made
   default, but the new requirements on CPU frontends will be retracted.

3. D85603 will be revised to include the new default convergent
   property.

Thanks,
Sameer.

Sameer Sahasrabuddhe via llvm-dev

2021-Jun-02 06:02 UTC

head link

[llvm-dev] RFC: make calls "convergent" by default

CC'ing some more people who got dropped when sending the previous mail.

Sameer.

Sameer Sahasrabuddhe via llvm-dev writes:
> TL;DR
> ====>
> We propose the following changes to LLVM IR in order to better support
> operations that are sensitive to the set of threads that execute them
> together:
>
> - Redefine "convergent" in terms of thread divergence in a
>   multi-threaded execution.
> - Fix all optimizations that examine the "convergent" attribute
to also
>   depend on divergence analysis. This avoids any impact on CPU
>   compilation since control flow is always uniform on CPUs.
> - Make all function calls "convergent" by default (D69498).
Introduce a
>   new "noconvergent" attribute, and make "convergent" a
nop.
> - Update the "convergence tokens" proposal to take into account
this new
>   default property (D85603).
>
> Motivation
> =========>
> This effort is necessary because the current "convergent"
attribute is
> considered under-defined and sorely needs replacement.
>
> 1. On GPU targets, the "convergent" attribute is required for
>    correctness. This is unlike other attributes that are only
>    used as optimization hints. Missing an attribute should not
>    result in a miscompilation.
>
> 2. The current definition of "convergent" attribute does not
precisely
>    represent the constraints on the compiler for a GPU target. The
>    actual implementation in LLVM sources is far more conservative than
>    what the definition says.
>
> 3. Due to the same lack of precision, the attribute cannot properly
>    represent the side-effects of jump threading on a GPU program.
>
> Background
> =========>
> This RFC is a continuation of a discussion split across the following
> two reviews. The two reviews compose well to cover all the shortcomings
> of the convergent attribute.
>
>   D69498: IR: Invert convergent attribute handling
>   https://reviews.llvm.org/D69498
>
> The above review aims to make all function calls "convergent" by
> default, but it received strong opposition due to the requirement that
> CPU frontends must now emit a new "noconvergent" attribute on
every
> function call.
>
>   D85603: IR: Add convergence control operand bundle and intrinsics
>   https://reviews.llvm.org/D85603
>
> The above review defines a "convergent operation" in terms of
divergent
> control flow in multi-threaded executions. It introduces a
"convergence
> token" passed as an operand bundle argument at a call, representing
the
> set of threads that together execute that call. This review has
> progressed to the point where there don't seem to be any major
> objections to it, but there is some interest in combining it with the
> original idea of making all calls convergent by default.
>
> Terms Used
> =========>
> The following definitions are paraphrased from D85603:
>
> Convergent Operation
>
>   Some parallel execution environments execute threads in groups that
>   allow efficient communication within each group. When control flow
>   diverges, i.e. threads of the same group follow different paths
>   through the CFG, not all threads of the group may be available to
>   participate in this communication. A convergent operation involves
>   inter-thread communication or synchronization that occurs outside of
>   the memory model, where the set of threads which participate in
>   communication is implicitly affected by control flow.
>
> Dynamic Instance
>
>   Every execution of an LLVM IR instruction occurs in a dynamic instance
>   of the instruction. Different executions of the same instruction by a
>   single thread give rise to different dynamic instances of that
>   instruction. Executions of different instructions always occur in
>   different dynamic instances. Executions of the same instruction by
>   different threads may occur in the same dynamic instance. When
>   executing a convergent operation, the set of threads that execute the
>   same dynamic instance is the set of threads that communicate with each
>   other for that operation.
>
> Optimization Constraints due to Convergent Calls
> ===============================================>
> In general, an optimization that modifies control flow in the program
> must ensure that the set of threads executing each dynamic instance of a
> convergent call is not affected.
>
> By default, every call in LLVM IR is assumed to be convergent. A
> frontend may further relax this in the following ways:
>
>   1. The "noconvergent" attribute may be added to indicate that a
call
>      is not sensitive to the set of threads executing any dynamic
>      instance of that call.
>
>   2. A "convergencectrl" operand bundle may be passed to the
call. The
>      semantics of such a "token", provides fine-grained control
over the
>      transforms possible near the callsite.
>
> The overall effect is to make the notion of convergence and divergence a
> universal property of LLVM IR. This provides a "safe default" in
the IR
> semantics, so that frontends and optimizations cannot produce incorrect
> IR on a GPU target by merely missing an attribute.
>
> At the same time, there is no effect on CPU optimizations. An
> optimization may use divergence analysis along with the above
> information to determine if a transformation is possible. The only
> impact on CPU compilation flows is the addition of divergence analysis
> as a dependency when checking for convergent operations. This analysis
> is trivial on CPUs where branches do not have divergence and hence all
> control flow is uniform.
>
> Implementation
> =============>
> The above proposal will be implemented as follows:
>
> 1. Optimizations that check for convergent operations will be updated to
>    depend on divergent analysis. For example, the following change will
>    be made in llvm/lib/Transforms/Scalar/Sink.cpp:
>
>    Before:
>
>      bool isSafeToMove(Instruction *Inst) {
>          ...
>          if (auto *Call = dyn_cast<CallBase>(Inst)) {
>              ...
>              if (Call->isConvergent())
>                  return false;
>              ...
>          }
>      }
>
>
>    After:
>
>      bool isSafeToMove(Instruction *Inst, DivergenceAnalysis &DA, ...)
{
>          ...
>          // don't sink a convergent call across a divergent branch
>          if (auto *Call = dyn_cast<CallBase>(Inst)) {
>              ...
>              auto Term = Inst->getParent()->getTerminator();
>              if (Call->isConvergent() && DA.isDivergent(Term))
>                  return false;
>              ...
>          }
>      }
>
> 2. D69498 will be updated so that the convergent property is made
>    default, but the new requirements on CPU frontends will be retracted.
>
> 3. D85603 will be revised to include the new default convergent
>    property.
>
> Thanks,
> Sameer.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jun 2021 - RFC: make calls "convergent" by default

[llvm-dev] RFC: make calls "convergent" by default

[llvm-dev] RFC: make calls "convergent" by default