thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] RFC: Convergent attribute [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Owen Anderson

2015-May-13 20:17 UTC

[LLVMdev] RFC: Convergent attribute

Below is a proposal for a new "convergent" intrinsic attribute and
MachineInstr property, needed for correctly modeling many SPMD/SIMT programming
models in LLVM.  Comments and feedback welcome.

—Owen





In order to make LLVM more suitable for programming models variously called SPMD
and SIMT, we would like to propose a new intrinsic and MachineInstr annotation
called "convergent", which will be used to impose certain control flow
and/or
code motion constraints that are necessary for the correct compilation of some
common constructs in these programming models.

Our proposal strives to define the semantics of these annotations *without*
introducing a definition of SPMD/SIMT programming models into LLVM IR.  Rather,
the properties that must be preserved are specified purely in terms of single
thread semantics.  This allows pass authors to reason about the constraints
without having to consider alternative programming models.  The downside to
this approach is that the motivation and necessity of these constraints in not
easily understood without understanding the programming model from which they
derive.

*** WHAT ***

(Thanks to Phil Reames for input on this definition.)

An operation marked convergent may be transformed or moved within the program
if and only the post-transform placement of the convergent operation is
control equivalent (A dominated B, B post-dominates A, or vice-versa) to
its original position.

This definition is overly strict with respect to some SPMD/SIMT models,
but cannot be relaxed without introducing a specific model into LLVM IR. We
believe it is important for LLVM itself to remain agnostic to any specific
model.  This allows core passes to preserve correctness for stricter models,
while more relaxed models can implement additional transforms that use
weaker constraints on top of core LLVM.

*** HOW ***

Once the attribute has been added, we anticipate the following changes to
optimization passes will be required:
  - Restrict Sink and MachineSink for convergent operations
  - Disabling PRE for convergent operations
  - Disabling jump threading of convergent operations
  - Auditing SimplifyCFG for additional transforms that break convergent
guarantees

*** WHY ***

SPMD/SIMT programming models are a family of related programming models in
which multiple threads execute in a per-instruction lockstep fashion.
Predication is typically used to implement acyclic control flow that would
otherwise diverge the PC address of the lockstep threads.

In these models, each thread's register set is typically indepedent, but
there
exist a small number of important circumstances in which a thread may access
register storage from one of its lockstep neighbors.  Examples include gradient
computation for texture lookups, as well a cross-thread broadcast and shuffle
operations.

These operations that provide access to another thread's register storage
pose
a particular challenge to the compiler, particularly when combined with the
use of predication for control flow.  Consider the following example:

// texture lookup that computes gradient of r0, last use of r0
r1 = texture2D(..., r0, ...)
if (...) {
  // r0 used as temporary here
  r0 = ...
  r2 = r0 + ...
} else {
  // only use of r1
  r2 = r1 + ...
}

In this example, various optimizations might try to sink the texture2D operation
into the else block, like so:

if (...) {
  r0 = ...
  r2 = r0 + ...
} else {
  r1 = texture2D(..., r0, ...)
  r2 = r1 + ...
}

At this point, it starts to become clear that a problem can occur when two
neighbor threads want to take different paths through the if-else construct.
Logically, the thread that wishes to execute the texture2D races with its
neighbor to reads the neighbor's value of r0 before it gets overridden.

In most SPMD/SIMT implementations, the fallout of this races is exposed via
the predicated expression of acyclic control flow:

pred0 <- cmp ...
if (pred0)  r0 = ...
if (pred0)  r2 = r0 + ...
if (!pred0) r1 = texture2D(..., r0, ...)
if (!pred0) r2 = r1 + ...

If thread 0 takes the else path and perform the texture2D operation, but
its neighbor thread 1 takes the then branch, then the texture2D will fail
because thread 1 has already overwritten its value of r0 before thread 0 has
a chance to read it.

Chandler Carruth

2015-May-13 22:50 UTC

head link

[LLVMdev] RFC: Convergent attribute

Just so I understand, this only would apply to intrinsic calls? Are there
any instructions that would make sense to carry a flag with the same
semantics?

Generally, I like this a lot. Using control equivalence as the fundamental
model is a very nice and simple rule to follow. As long as the wiggle room
between it and the technical limitations in SPMD models, fantastic.

-Chandler

On Wed, May 13, 2015 at 2:24 PM Owen Anderson <resistor at mac.com> wrote:
> Below is a proposal for a new "convergent" intrinsic attribute
and
> MachineInstr property, needed for correctly modeling many SPMD/SIMT
> programming models in LLVM.  Comments and feedback welcome.
>
> —Owen
>
>
>
>
>
> In order to make LLVM more suitable for programming models variously
> called SPMD
> and SIMT, we would like to propose a new intrinsic and MachineInstr
> annotation
> called "convergent", which will be used to impose certain control
flow
> and/or
> code motion constraints that are necessary for the correct compilation of
> some
> common constructs in these programming models.
>
> Our proposal strives to define the semantics of these annotations *without*
> introducing a definition of SPMD/SIMT programming models into LLVM IR.
> Rather,
> the properties that must be preserved are specified purely in terms of
> single
> thread semantics.  This allows pass authors to reason about the constraints
> without having to consider alternative programming models.  The downside to
> this approach is that the motivation and necessity of these constraints in
> not
> easily understood without understanding the programming model from which
> they
> derive.
>
> *** WHAT ***
>
> (Thanks to Phil Reames for input on this definition.)
>
> An operation marked convergent may be transformed or moved within the
> program
> if and only the post-transform placement of the convergent operation is
> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
> its original position.
>
> This definition is overly strict with respect to some SPMD/SIMT models,
> but cannot be relaxed without introducing a specific model into LLVM IR. We
> believe it is important for LLVM itself to remain agnostic to any specific
> model.  This allows core passes to preserve correctness for stricter
> models,
> while more relaxed models can implement additional transforms that use
> weaker constraints on top of core LLVM.
>
> *** HOW ***
>
> Once the attribute has been added, we anticipate the following changes to
> optimization passes will be required:
>   - Restrict Sink and MachineSink for convergent operations
>   - Disabling PRE for convergent operations
>   - Disabling jump threading of convergent operations
>   - Auditing SimplifyCFG for additional transforms that break convergent
> guarantees
>
> *** WHY ***
>
> SPMD/SIMT programming models are a family of related programming models in
> which multiple threads execute in a per-instruction lockstep fashion.
> Predication is typically used to implement acyclic control flow that would
> otherwise diverge the PC address of the lockstep threads.
>
> In these models, each thread's register set is typically indepedent,
but
> there
> exist a small number of important circumstances in which a thread may
> access
> register storage from one of its lockstep neighbors.  Examples include
> gradient
> computation for texture lookups, as well a cross-thread broadcast and
> shuffle
> operations.
>
> These operations that provide access to another thread's register
storage
> pose
> a particular challenge to the compiler, particularly when combined with the
> use of predication for control flow.  Consider the following example:
>
> // texture lookup that computes gradient of r0, last use of r0
> r1 = texture2D(..., r0, ...)
> if (...) {
>   // r0 used as temporary here
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   // only use of r1
>   r2 = r1 + ...
> }
>
> In this example, various optimizations might try to sink the texture2D
> operation
> into the else block, like so:
>
> if (...) {
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   r1 = texture2D(..., r0, ...)
>   r2 = r1 + ...
> }
>
> At this point, it starts to become clear that a problem can occur when two
> neighbor threads want to take different paths through the if-else
> construct.
> Logically, the thread that wishes to execute the texture2D races with its
> neighbor to reads the neighbor's value of r0 before it gets overridden.
>
> In most SPMD/SIMT implementations, the fallout of this races is exposed via
> the predicated expression of acyclic control flow:
>
> pred0 <- cmp ...
> if (pred0)  r0 = ...
> if (pred0)  r2 = r0 + ...
> if (!pred0) r1 = texture2D(..., r0, ...)
> if (!pred0) r2 = r1 + ...
>
> If thread 0 takes the else path and perform the texture2D operation, but
> its neighbor thread 1 takes the then branch, then the texture2D will fail
> because thread 1 has already overwritten its value of r0 before thread 0
> has
> a chance to read it.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/e2554818/attachment.html>

Owen Anderson

2015-May-13 23:21 UTC

head link

[LLVMdev] RFC: Convergent attribute

> On May 13, 2015, at 3:50 PM, Chandler Carruth <chandlerc at
google.com> wrote:
> 
> Just so I understand, this only would apply to intrinsic calls? Are there
any instructions that would make sense to carry a flag with the same semantics?
I can’t think of a reason why it would apply to anything other than intrinsic
calls at the LLVM IR level.  At the MI level it probably needs to be a property
that can apply to any opcode.

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/5a7d8b5e/attachment.html>

Eric Christopher

2015-May-13 23:32 UTC

head link

[LLVMdev] RFC: Convergent attribute

On Wed, May 13, 2015 at 1:24 PM Owen Anderson <resistor at mac.com> wrote:
> Below is a proposal for a new "convergent" intrinsic attribute
and
> MachineInstr property, needed for correctly modeling many SPMD/SIMT
> programming models in LLVM.  Comments and feedback welcome.
>
>We've spoken about this before, but for the record it totally makes sense
to me and sounds useful.

-eric

> —Owen
>
>
>
>
>
> In order to make LLVM more suitable for programming models variously
> called SPMD
> and SIMT, we would like to propose a new intrinsic and MachineInstr
> annotation
> called "convergent", which will be used to impose certain control
flow
> and/or
> code motion constraints that are necessary for the correct compilation of
> some
> common constructs in these programming models.
>
> Our proposal strives to define the semantics of these annotations *without*
> introducing a definition of SPMD/SIMT programming models into LLVM IR.
> Rather,
> the properties that must be preserved are specified purely in terms of
> single
> thread semantics.  This allows pass authors to reason about the constraints
> without having to consider alternative programming models.  The downside to
> this approach is that the motivation and necessity of these constraints in
> not
> easily understood without understanding the programming model from which
> they
> derive.
>
> *** WHAT ***
>
> (Thanks to Phil Reames for input on this definition.)
>
> An operation marked convergent may be transformed or moved within the
> program
> if and only the post-transform placement of the convergent operation is
> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
> its original position.
>
> This definition is overly strict with respect to some SPMD/SIMT models,
> but cannot be relaxed without introducing a specific model into LLVM IR. We
> believe it is important for LLVM itself to remain agnostic to any specific
> model.  This allows core passes to preserve correctness for stricter
> models,
> while more relaxed models can implement additional transforms that use
> weaker constraints on top of core LLVM.
>
> *** HOW ***
>
> Once the attribute has been added, we anticipate the following changes to
> optimization passes will be required:
>   - Restrict Sink and MachineSink for convergent operations
>   - Disabling PRE for convergent operations
>   - Disabling jump threading of convergent operations
>   - Auditing SimplifyCFG for additional transforms that break convergent
> guarantees
>
> *** WHY ***
>
> SPMD/SIMT programming models are a family of related programming models in
> which multiple threads execute in a per-instruction lockstep fashion.
> Predication is typically used to implement acyclic control flow that would
> otherwise diverge the PC address of the lockstep threads.
>
> In these models, each thread's register set is typically indepedent,
but
> there
> exist a small number of important circumstances in which a thread may
> access
> register storage from one of its lockstep neighbors.  Examples include
> gradient
> computation for texture lookups, as well a cross-thread broadcast and
> shuffle
> operations.
>
> These operations that provide access to another thread's register
storage
> pose
> a particular challenge to the compiler, particularly when combined with the
> use of predication for control flow.  Consider the following example:
>
> // texture lookup that computes gradient of r0, last use of r0
> r1 = texture2D(..., r0, ...)
> if (...) {
>   // r0 used as temporary here
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   // only use of r1
>   r2 = r1 + ...
> }
>
> In this example, various optimizations might try to sink the texture2D
> operation
> into the else block, like so:
>
> if (...) {
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   r1 = texture2D(..., r0, ...)
>   r2 = r1 + ...
> }
>
> At this point, it starts to become clear that a problem can occur when two
> neighbor threads want to take different paths through the if-else
> construct.
> Logically, the thread that wishes to execute the texture2D races with its
> neighbor to reads the neighbor's value of r0 before it gets overridden.
>
> In most SPMD/SIMT implementations, the fallout of this races is exposed via
> the predicated expression of acyclic control flow:
>
> pred0 <- cmp ...
> if (pred0)  r0 = ...
> if (pred0)  r2 = r0 + ...
> if (!pred0) r1 = texture2D(..., r0, ...)
> if (!pred0) r2 = r1 + ...
>
> If thread 0 takes the else path and perform the texture2D operation, but
> its neighbor thread 1 takes the then branch, then the texture2D will fail
> because thread 1 has already overwritten its value of r0 before thread 0
> has
> a chance to read it.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/ee0e9e63/attachment.html>

Philip Reames

2015-May-14 01:00 UTC

head link

[LLVMdev] RFC: Convergent attribute

LGTM.  Please put pretty much everything in this email into a 
documentation page.  Doesn't have to be LangRef, but definitely 
something linked from there.

Also, it would be good to explicitly say that this is working around a 
limitation in the register allocator.  Just because it's a limitation 
which would be very hard to address doesn't mean it isn't a limitation. 
:)

Philip

On 05/13/2015 01:17 PM, Owen Anderson wrote:> Below is a proposal for a new "convergent" intrinsic attribute
and MachineInstr property, needed for correctly modeling many SPMD/SIMT
programming models in LLVM.  Comments and feedback welcome.
>
> —Owen
>
>
>
>
>
> In order to make LLVM more suitable for programming models variously called
SPMD
> and SIMT, we would like to propose a new intrinsic and MachineInstr
annotation
> called "convergent", which will be used to impose certain control
flow and/or
> code motion constraints that are necessary for the correct compilation of
some
> common constructs in these programming models.
>
> Our proposal strives to define the semantics of these annotations *without*
> introducing a definition of SPMD/SIMT programming models into LLVM IR. 
Rather,
> the properties that must be preserved are specified purely in terms of
single
> thread semantics.  This allows pass authors to reason about the constraints
> without having to consider alternative programming models.  The downside to
> this approach is that the motivation and necessity of these constraints in
not
> easily understood without understanding the programming model from which
they
> derive.
>
> *** WHAT ***
>
> (Thanks to Phil Reames for input on this definition.)
>
> An operation marked convergent may be transformed or moved within the
program
> if and only the post-transform placement of the convergent operation is
> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
> its original position.
>
> This definition is overly strict with respect to some SPMD/SIMT models,
> but cannot be relaxed without introducing a specific model into LLVM IR. We
> believe it is important for LLVM itself to remain agnostic to any specific
> model.  This allows core passes to preserve correctness for stricter
models,
> while more relaxed models can implement additional transforms that use
> weaker constraints on top of core LLVM.
>
> *** HOW ***
>
> Once the attribute has been added, we anticipate the following changes to
> optimization passes will be required:
>    - Restrict Sink and MachineSink for convergent operations
>    - Disabling PRE for convergent operations
>    - Disabling jump threading of convergent operations
>    - Auditing SimplifyCFG for additional transforms that break convergent
guarantees
>
> *** WHY ***
>
> SPMD/SIMT programming models are a family of related programming models in
> which multiple threads execute in a per-instruction lockstep fashion.
> Predication is typically used to implement acyclic control flow that would
> otherwise diverge the PC address of the lockstep threads.
>
> In these models, each thread's register set is typically indepedent,
but there
> exist a small number of important circumstances in which a thread may
access
> register storage from one of its lockstep neighbors.  Examples include
gradient
> computation for texture lookups, as well a cross-thread broadcast and
shuffle
> operations.
>
> These operations that provide access to another thread's register
storage pose
> a particular challenge to the compiler, particularly when combined with the
> use of predication for control flow.  Consider the following example:
>
> // texture lookup that computes gradient of r0, last use of r0
> r1 = texture2D(..., r0, ...)
> if (...) {
>    // r0 used as temporary here
>    r0 = ...
>    r2 = r0 + ...
> } else {
>    // only use of r1
>    r2 = r1 + ...
> }
>
> In this example, various optimizations might try to sink the texture2D
operation
> into the else block, like so:
>
> if (...) {
>    r0 = ...
>    r2 = r0 + ...
> } else {
>    r1 = texture2D(..., r0, ...)
>    r2 = r1 + ...
> }
>
> At this point, it starts to become clear that a problem can occur when two
> neighbor threads want to take different paths through the if-else
construct.
> Logically, the thread that wishes to execute the texture2D races with its
> neighbor to reads the neighbor's value of r0 before it gets overridden.
>
> In most SPMD/SIMT implementations, the fallout of this races is exposed via
> the predicated expression of acyclic control flow:
>
> pred0 <- cmp ...
> if (pred0)  r0 = ...
> if (pred0)  r2 = r0 + ...
> if (!pred0) r1 = texture2D(..., r0, ...)
> if (!pred0) r2 = r1 + ...
>
> If thread 0 takes the else path and perform the texture2D operation, but
> its neighbor thread 1 takes the then branch, then the texture2D will fail
> because thread 1 has already overwritten its value of r0 before thread 0
has
> a chance to read it.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Matthias Braun

2015-May-14 03:19 UTC

head link

[LLVMdev] RFC: Convergent attribute

Why is this a regalloc problem? I assume in the example below the "r0"
is somehow forced by the ABI? Because otherwise moving the texture2d operation
into the branch wouldn't matter as long as we assign different registers to
the two branches and use a technique like
lib/Target/R600/SIFixSGPRLiveRanges.cpp.

- Matthias
> On May 13, 2015, at 6:00 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> LGTM.  Please put pretty much everything in this email into a documentation
page.  Doesn't have to be LangRef, but definitely something linked from
there.
> 
> Also, it would be good to explicitly say that this is working around a
limitation in the register allocator.  Just because it's a limitation which
would be very hard to address doesn't mean it isn't a limitation.  :)
> 
> Philip
> 
> On 05/13/2015 01:17 PM, Owen Anderson wrote:
>> Below is a proposal for a new "convergent" intrinsic
attribute and MachineInstr property, needed for correctly modeling many
SPMD/SIMT programming models in LLVM.  Comments and feedback welcome.
>> 
>> —Owen
>> 
>> 
>> 
>> 
>> 
>> In order to make LLVM more suitable for programming models variously
called SPMD
>> and SIMT, we would like to propose a new intrinsic and MachineInstr
annotation
>> called "convergent", which will be used to impose certain
control flow and/or
>> code motion constraints that are necessary for the correct compilation
of some
>> common constructs in these programming models.
>> 
>> Our proposal strives to define the semantics of these annotations
*without*
>> introducing a definition of SPMD/SIMT programming models into LLVM IR. 
Rather,
>> the properties that must be preserved are specified purely in terms of
single
>> thread semantics.  This allows pass authors to reason about the
constraints
>> without having to consider alternative programming models.  The
downside to
>> this approach is that the motivation and necessity of these constraints
in not
>> easily understood without understanding the programming model from
which they
>> derive.
>> 
>> *** WHAT ***
>> 
>> (Thanks to Phil Reames for input on this definition.)
>> 
>> An operation marked convergent may be transformed or moved within the
program
>> if and only the post-transform placement of the convergent operation is
>> control equivalent (A dominated B, B post-dominates A, or vice-versa)
to
>> its original position.
>> 
>> This definition is overly strict with respect to some SPMD/SIMT models,
>> but cannot be relaxed without introducing a specific model into LLVM
IR. We
>> believe it is important for LLVM itself to remain agnostic to any
specific
>> model.  This allows core passes to preserve correctness for stricter
models,
>> while more relaxed models can implement additional transforms that use
>> weaker constraints on top of core LLVM.
>> 
>> *** HOW ***
>> 
>> Once the attribute has been added, we anticipate the following changes
to
>> optimization passes will be required:
>>   - Restrict Sink and MachineSink for convergent operations
>>   - Disabling PRE for convergent operations
>>   - Disabling jump threading of convergent operations
>>   - Auditing SimplifyCFG for additional transforms that break
convergent guarantees
>> 
>> *** WHY ***
>> 
>> SPMD/SIMT programming models are a family of related programming models
in
>> which multiple threads execute in a per-instruction lockstep fashion.
>> Predication is typically used to implement acyclic control flow that
would
>> otherwise diverge the PC address of the lockstep threads.
>> 
>> In these models, each thread's register set is typically
indepedent, but there
>> exist a small number of important circumstances in which a thread may
access
>> register storage from one of its lockstep neighbors.  Examples include
gradient
>> computation for texture lookups, as well a cross-thread broadcast and
shuffle
>> operations.
>> 
>> These operations that provide access to another thread's register
storage pose
>> a particular challenge to the compiler, particularly when combined with
the
>> use of predication for control flow.  Consider the following example:
>> 
>> // texture lookup that computes gradient of r0, last use of r0
>> r1 = texture2D(..., r0, ...)
>> if (...) {
>>   // r0 used as temporary here
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   // only use of r1
>>   r2 = r1 + ...
>> }
>> 
>> In this example, various optimizations might try to sink the texture2D
operation
>> into the else block, like so:
>> 
>> if (...) {
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   r1 = texture2D(..., r0, ...)
>>   r2 = r1 + ...
>> }
>> 
>> At this point, it starts to become clear that a problem can occur when
two
>> neighbor threads want to take different paths through the if-else
construct.
>> Logically, the thread that wishes to execute the texture2D races with
its
>> neighbor to reads the neighbor's value of r0 before it gets
overridden.
>> 
>> In most SPMD/SIMT implementations, the fallout of this races is exposed
via
>> the predicated expression of acyclic control flow:
>> 
>> pred0 <- cmp ...
>> if (pred0)  r0 = ...
>> if (pred0)  r2 = r0 + ...
>> if (!pred0) r1 = texture2D(..., r0, ...)
>> if (!pred0) r2 = r1 + ...
>> 
>> If thread 0 takes the else path and perform the texture2D operation,
but
>> its neighbor thread 1 takes the then branch, then the texture2D will
fail
>> because thread 1 has already overwritten its value of r0 before thread
0 has
>> a chance to read it.
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Mehdi Amini

2015-May-14 15:33 UTC

head link

[LLVMdev] RFC: Convergent attribute

> On May 13, 2015, at 6:00 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> LGTM.  Please put pretty much everything in this email into a documentation
page.  Doesn't have to be LangRef, but definitely something linked from
there.
> 
> Also, it would be good to explicitly say that this is working around a
limitation in the register allocator.  Just because it's a limitation which
would be very hard to address doesn't mean it isn't a limitation.  :)
Only a subset of the problem (architecture specific) can be addressed by a
“clever” register allocator I think. The general problem goes beyond.

— 
Mehdi
> 
> Philip
> 
> On 05/13/2015 01:17 PM, Owen Anderson wrote:
>> Below is a proposal for a new "convergent" intrinsic
attribute and MachineInstr property, needed for correctly modeling many
SPMD/SIMT programming models in LLVM.  Comments and feedback welcome.
>> 
>> —Owen
>> 
>> 
>> 
>> 
>> 
>> In order to make LLVM more suitable for programming models variously
called SPMD
>> and SIMT, we would like to propose a new intrinsic and MachineInstr
annotation
>> called "convergent", which will be used to impose certain
control flow and/or
>> code motion constraints that are necessary for the correct compilation
of some
>> common constructs in these programming models.
>> 
>> Our proposal strives to define the semantics of these annotations
*without*
>> introducing a definition of SPMD/SIMT programming models into LLVM IR. 
Rather,
>> the properties that must be preserved are specified purely in terms of
single
>> thread semantics.  This allows pass authors to reason about the
constraints
>> without having to consider alternative programming models.  The
downside to
>> this approach is that the motivation and necessity of these constraints
in not
>> easily understood without understanding the programming model from
which they
>> derive.
>> 
>> *** WHAT ***
>> 
>> (Thanks to Phil Reames for input on this definition.)
>> 
>> An operation marked convergent may be transformed or moved within the
program
>> if and only the post-transform placement of the convergent operation is
>> control equivalent (A dominated B, B post-dominates A, or vice-versa)
to
>> its original position.
>> 
>> This definition is overly strict with respect to some SPMD/SIMT models,
>> but cannot be relaxed without introducing a specific model into LLVM
IR. We
>> believe it is important for LLVM itself to remain agnostic to any
specific
>> model.  This allows core passes to preserve correctness for stricter
models,
>> while more relaxed models can implement additional transforms that use
>> weaker constraints on top of core LLVM.
>> 
>> *** HOW ***
>> 
>> Once the attribute has been added, we anticipate the following changes
to
>> optimization passes will be required:
>>   - Restrict Sink and MachineSink for convergent operations
>>   - Disabling PRE for convergent operations
>>   - Disabling jump threading of convergent operations
>>   - Auditing SimplifyCFG for additional transforms that break
convergent guarantees
>> 
>> *** WHY ***
>> 
>> SPMD/SIMT programming models are a family of related programming models
in
>> which multiple threads execute in a per-instruction lockstep fashion.
>> Predication is typically used to implement acyclic control flow that
would
>> otherwise diverge the PC address of the lockstep threads.
>> 
>> In these models, each thread's register set is typically
indepedent, but there
>> exist a small number of important circumstances in which a thread may
access
>> register storage from one of its lockstep neighbors.  Examples include
gradient
>> computation for texture lookups, as well a cross-thread broadcast and
shuffle
>> operations.
>> 
>> These operations that provide access to another thread's register
storage pose
>> a particular challenge to the compiler, particularly when combined with
the
>> use of predication for control flow.  Consider the following example:
>> 
>> // texture lookup that computes gradient of r0, last use of r0
>> r1 = texture2D(..., r0, ...)
>> if (...) {
>>   // r0 used as temporary here
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   // only use of r1
>>   r2 = r1 + ...
>> }
>> 
>> In this example, various optimizations might try to sink the texture2D
operation
>> into the else block, like so:
>> 
>> if (...) {
>>   r0 = ...
>>   r2 = r0 + ...
>> } else {
>>   r1 = texture2D(..., r0, ...)
>>   r2 = r1 + ...
>> }
>> 
>> At this point, it starts to become clear that a problem can occur when
two
>> neighbor threads want to take different paths through the if-else
construct.
>> Logically, the thread that wishes to execute the texture2D races with
its
>> neighbor to reads the neighbor's value of r0 before it gets
overridden.
>> 
>> In most SPMD/SIMT implementations, the fallout of this races is exposed
via
>> the predicated expression of acyclic control flow:
>> 
>> pred0 <- cmp ...
>> if (pred0)  r0 = ...
>> if (pred0)  r2 = r0 + ...
>> if (!pred0) r1 = texture2D(..., r0, ...)
>> if (!pred0) r2 = r1 + ...
>> 
>> If thread 0 takes the else path and perform the texture2D operation,
but
>> its neighbor thread 1 takes the then branch, then the texture2D will
fail
>> because thread 1 has already overwritten its value of r0 before thread
0 has
>> a chance to read it.
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jingyue Wu via llvm-dev

2015-Aug-13 22:12 UTC

head link

[llvm-dev] [LLVMdev] RFC: Convergent attribute

Hi Owen,

According to your design, is LLVM supposed to (partially) disallow inlining
a function that has convergent instructions? It's hard to define control
equivalent inter-procedurally. For example, if a function containing a
convergent instruction is called at two call sites, inlining the function
produces two convergent instructions. Neither of the two is control
equivalent to the original, but they combined are in some sense.

I came across this when I am thinking whether __syncthreads in CUDA should
be tagged "convergent'. Right now, it's tagged as noduplicate so
inlining
and loop unrolling are disallowed. But I think noduplicate is too strong
for the semantics of convergent.

Jingyue

On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com>
wrote:
> Below is a proposal for a new "convergent" intrinsic attribute
and
> MachineInstr property, needed for correctly modeling many SPMD/SIMT
> programming models in LLVM.  Comments and feedback welcome.
>
> —Owen
>
>
>
>
>
> In order to make LLVM more suitable for programming models variously
> called SPMD
> and SIMT, we would like to propose a new intrinsic and MachineInstr
> annotation
> called "convergent", which will be used to impose certain control
flow
> and/or
> code motion constraints that are necessary for the correct compilation of
> some
> common constructs in these programming models.
>
> Our proposal strives to define the semantics of these annotations *without*
> introducing a definition of SPMD/SIMT programming models into LLVM IR.
> Rather,
> the properties that must be preserved are specified purely in terms of
> single
> thread semantics.  This allows pass authors to reason about the constraints
> without having to consider alternative programming models.  The downside to
> this approach is that the motivation and necessity of these constraints in
> not
> easily understood without understanding the programming model from which
> they
> derive.
>
> *** WHAT ***
>
> (Thanks to Phil Reames for input on this definition.)
>
> An operation marked convergent may be transformed or moved within the
> program
> if and only the post-transform placement of the convergent operation is
> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
> its original position.
>
> This definition is overly strict with respect to some SPMD/SIMT models,
> but cannot be relaxed without introducing a specific model into LLVM IR. We
> believe it is important for LLVM itself to remain agnostic to any specific
> model.  This allows core passes to preserve correctness for stricter
> models,
> while more relaxed models can implement additional transforms that use
> weaker constraints on top of core LLVM.
>
> *** HOW ***
>
> Once the attribute has been added, we anticipate the following changes to
> optimization passes will be required:
>   - Restrict Sink and MachineSink for convergent operations
>   - Disabling PRE for convergent operations
>   - Disabling jump threading of convergent operations
>   - Auditing SimplifyCFG for additional transforms that break convergent
> guarantees
>
> *** WHY ***
>
> SPMD/SIMT programming models are a family of related programming models in
> which multiple threads execute in a per-instruction lockstep fashion.
> Predication is typically used to implement acyclic control flow that would
> otherwise diverge the PC address of the lockstep threads.
>
> In these models, each thread's register set is typically indepedent,
but
> there
> exist a small number of important circumstances in which a thread may
> access
> register storage from one of its lockstep neighbors.  Examples include
> gradient
> computation for texture lookups, as well a cross-thread broadcast and
> shuffle
> operations.
>
> These operations that provide access to another thread's register
storage
> pose
> a particular challenge to the compiler, particularly when combined with the
> use of predication for control flow.  Consider the following example:
>
> // texture lookup that computes gradient of r0, last use of r0
> r1 = texture2D(..., r0, ...)
> if (...) {
>   // r0 used as temporary here
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   // only use of r1
>   r2 = r1 + ...
> }
>
> In this example, various optimizations might try to sink the texture2D
> operation
> into the else block, like so:
>
> if (...) {
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   r1 = texture2D(..., r0, ...)
>   r2 = r1 + ...
> }
>
> At this point, it starts to become clear that a problem can occur when two
> neighbor threads want to take different paths through the if-else
> construct.
> Logically, the thread that wishes to execute the texture2D races with its
> neighbor to reads the neighbor's value of r0 before it gets overridden.
>
> In most SPMD/SIMT implementations, the fallout of this races is exposed via
> the predicated expression of acyclic control flow:
>
> pred0 <- cmp ...
> if (pred0)  r0 = ...
> if (pred0)  r2 = r0 + ...
> if (!pred0) r1 = texture2D(..., r0, ...)
> if (!pred0) r2 = r1 + ...
>
> If thread 0 takes the else path and perform the texture2D operation, but
> its neighbor thread 1 takes the then branch, then the texture2D will fail
> because thread 1 has already overwritten its value of r0 before thread 0
> has
> a chance to read it.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/d37c2e02/attachment.html>

Owen Anderson via llvm-dev

2015-Aug-14 04:43 UTC

head link

[llvm-dev] [LLVMdev] RFC: Convergent attribute

Hi Jingyue,

Convergent is not intended to prevent inlining.  It’s tricky to formalize this
inter-procedurally, but the intended interpretation is that a convergent
operation cannot be move either into or out of a conditionally executed region. 
Normal inlining would not violate that.

I would imagine that it would make sense to use a combination of convergent and
noduplicate for barrier-like operations.

—Owen
> On Aug 13, 2015, at 3:12 PM, Jingyue Wu <jingyue at google.com>
wrote:
> 
> Hi Owen, 
> 
> According to your design, is LLVM supposed to (partially) disallow inlining
a function that has convergent instructions? It's hard to define control
equivalent inter-procedurally. For example, if a function containing a
convergent instruction is called at two call sites, inlining the function
produces two convergent instructions. Neither of the two is control equivalent
to the original, but they combined are in some sense.
> 
> I came across this when I am thinking whether __syncthreads in CUDA should
be tagged "convergent'. Right now, it's tagged as noduplicate so
inlining and loop unrolling are disallowed. But I think noduplicate is too
strong for the semantics of convergent.
> 
> Jingyue
> 
> On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor at mac.com
<mailto:resistor at mac.com>> wrote:
> Below is a proposal for a new "convergent" intrinsic attribute
and MachineInstr property, needed for correctly modeling many SPMD/SIMT
programming models in LLVM.  Comments and feedback welcome.
> 
> —Owen
> 
> 
> 
> 
> 
> In order to make LLVM more suitable for programming models variously called
SPMD
> and SIMT, we would like to propose a new intrinsic and MachineInstr
annotation
> called "convergent", which will be used to impose certain control
flow and/or
> code motion constraints that are necessary for the correct compilation of
some
> common constructs in these programming models.
> 
> Our proposal strives to define the semantics of these annotations *without*
> introducing a definition of SPMD/SIMT programming models into LLVM IR. 
Rather,
> the properties that must be preserved are specified purely in terms of
single
> thread semantics.  This allows pass authors to reason about the constraints
> without having to consider alternative programming models.  The downside to
> this approach is that the motivation and necessity of these constraints in
not
> easily understood without understanding the programming model from which
they
> derive.
> 
> *** WHAT ***
> 
> (Thanks to Phil Reames for input on this definition.)
> 
> An operation marked convergent may be transformed or moved within the
program
> if and only the post-transform placement of the convergent operation is
> control equivalent (A dominated B, B post-dominates A, or vice-versa) to
> its original position.
> 
> This definition is overly strict with respect to some SPMD/SIMT models,
> but cannot be relaxed without introducing a specific model into LLVM IR. We
> believe it is important for LLVM itself to remain agnostic to any specific
> model.  This allows core passes to preserve correctness for stricter
models,
> while more relaxed models can implement additional transforms that use
> weaker constraints on top of core LLVM.
> 
> *** HOW ***
> 
> Once the attribute has been added, we anticipate the following changes to
> optimization passes will be required:
>   - Restrict Sink and MachineSink for convergent operations
>   - Disabling PRE for convergent operations
>   - Disabling jump threading of convergent operations
>   - Auditing SimplifyCFG for additional transforms that break convergent
guarantees
> 
> *** WHY ***
> 
> SPMD/SIMT programming models are a family of related programming models in
> which multiple threads execute in a per-instruction lockstep fashion.
> Predication is typically used to implement acyclic control flow that would
> otherwise diverge the PC address of the lockstep threads.
> 
> In these models, each thread's register set is typically indepedent,
but there
> exist a small number of important circumstances in which a thread may
access
> register storage from one of its lockstep neighbors.  Examples include
gradient
> computation for texture lookups, as well a cross-thread broadcast and
shuffle
> operations.
> 
> These operations that provide access to another thread's register
storage pose
> a particular challenge to the compiler, particularly when combined with the
> use of predication for control flow.  Consider the following example:
> 
> // texture lookup that computes gradient of r0, last use of r0
> r1 = texture2D(..., r0, ...)
> if (...) {
>   // r0 used as temporary here
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   // only use of r1
>   r2 = r1 + ...
> }
> 
> In this example, various optimizations might try to sink the texture2D
operation
> into the else block, like so:
> 
> if (...) {
>   r0 = ...
>   r2 = r0 + ...
> } else {
>   r1 = texture2D(..., r0, ...)
>   r2 = r1 + ...
> }
> 
> At this point, it starts to become clear that a problem can occur when two
> neighbor threads want to take different paths through the if-else
construct.
> Logically, the thread that wishes to execute the texture2D races with its
> neighbor to reads the neighbor's value of r0 before it gets overridden.
> 
> In most SPMD/SIMT implementations, the fallout of this races is exposed via
> the predicated expression of acyclic control flow:
> 
> pred0 <- cmp ...
> if (pred0)  r0 = ...
> if (pred0)  r2 = r0 + ...
> if (!pred0) r1 = texture2D(..., r0, ...)
> if (!pred0) r2 = r1 + ...
> 
> If thread 0 takes the else path and perform the texture2D operation, but
> its neighbor thread 1 takes the then branch, then the texture2D will fail
> because thread 1 has already overwritten its value of r0 before thread 0
has
> a chance to read it.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/88b65e89/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Aug 2015 - [LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[LLVMdev] RFC: Convergent attribute

[llvm-dev] [LLVMdev] RFC: Convergent attribute

[llvm-dev] [LLVMdev] RFC: Convergent attribute

Reasonably Related Threads