thr3ads.net - llvm dev - [LLVMdev] [RFC] "noclone" function attribute [Dec 2012]

If this information is useful, please help other people find it:
Share via:

James Molloy

2012-Dec-01 16:02 UTC

[LLVMdev] [RFC] "noclone" function attribute

Hi,

OpenCL has a "barrier" function with very specific semantics, and
there is currently no analogue to model this in LLVM.

This has been touched on by the SPIR folks but I don't believe they put
forward a proposal.

The barrier function is a special function that ensures that all workitems
executing a kernel have executed up to that point before execution on any
workitem can continue.

The CL spec is specific about how user kernels can use barriers - the sequence
of barriers that are hit by all workitems in a workgroup must be identical. An
issue occurs when defining what "the same barrier" actually means,
however. GPU Hardware, and CPU implementations such as Ralf Karrenberg's
(http://llvm.org/devmtg/2012-04-12/Slides/Ralf_Karrenberg.pdf) key off the PC,
so barrier call A and barrier call B are the same if and only if the PC value at
A and B is the same, for some definition of PC.

Last time this was mentioned, Eli suggested that keying off the PC was a bit
silly - it is my understanding that the next CL spec has "named
barriers" proposed, which give the key to the barrier function explicitly
as a parameter. However even if this is ratified, we (CL vendors) still need to
support the old behaviour of keying off the PC.

This (keying off the PC) has advantages in terms of implementation for the CPU.
For an example, and an example of how this can go wrong, see the end of this
message.

This can go wrong if a barrier call is cloned. This can happen in loop
unrolling, loop unswitching and jump threading, currently. I believe multiple CL
vendors have hacked ad-hoc checking in these three areas currently - it'd be
nice to standardise this and reduce downstream hacks.

I'm proposing a new function attribute, "noclone", with the
semantics that "calls to functions marked "noclone" cannot be
cloned or duplicated into the same function.". That is, it is illegal to
call J = I->clone() then attach J to the same basic block as I if I is marked
"noclone".

This means that cloning whole functions (CloneFunction and CloneFunctionInto)
will still work fine, but CloneBasicBlock with a new parent set equal to the old
parent (i.e. cloning a block in the same function) will assert.

I have a proof of concept patch for this but it's slightly out of date, so
I'll need to update it.

I'm envisaging a large group of people with torches and pitchforks walking
menacingly towards me right now, so without further ado I'll hand over to
them to tell me where I've gone wrong and why the idea is utterly
braindead...

Cheers,

James

EXAMPLE
======
Ralf Karrenberg proposed an algorithm which for a kernel like this:

kernel void k() {
  if (x())
    y();
  barrier();
  if (x())
    z();
  else
    w();
}

split it up into sub-functions and would produce a state machine and a loop
similar to this:

while (1) {
switch (state) {
case STATE_START:
  for (x...) for (y...) for (z...)
    state = kernel_START(x, y, z);
  break;

case STATE_BARRIER1:
  for (x...) for (y...) for (z...)
    state = kernel_BARRIER1(x, y, z);
  break;
  
case STATE_END:
  return;
}
}

where every kernel sub-function (kernel_START and kernel_BARRIER1 in this
example) return a new state.

Notice this relies upon all calls to either kernel_START or kernel_BARRIER1
returning the *same* next state. This is guaranteed by the OpenCL spec.

Let's apply jump threading to that kernel:

kernel void k() {
  if (x()) {
    y();
    barrier();
    z();
  } else {
    barrier();
    w();
  }
}

Oh dear. Now, we'd end up creating a state machine with four states - START,
BARRIER1, BARRIER2 and END. It is no longer guaranteed that all workitems will
hit the same barrier, because we've broken an invariant the user guaranteed.
Our optimisation has been broken.

Krzysztof Parzyszek

2012-Dec-01 16:22 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

On 12/1/2012 10:02 AM, James Molloy wrote:>
> This means that cloning whole functions (CloneFunction and
CloneFunctionInto) will still work [...].
Unfortunately, it won't work.

Assume all threads call foo:

foo() {
   ...
   bar(i)
   ...
}

bar(int i) {
   ...
   barrier();
   ...
}


Now, suppose that we have discovered that bar(0) can be greatly 
optimized and generate a call to the specialized version, bar_0:

foo() {
   ...
   if (i == 0) bar_0();
   else        bar(i);
   ...
}


And now we have multiple threads that no longer have a common barrier.


-Krzysztof


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

James Molloy

2012-Dec-01 16:36 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

Hi Krzysztof,

Yes, however this can be solved in one of two ways:

1) Fully inline the call graph for all leaf functions that call the barrier
intrinsic. This is done on several implementations as standard already, and
"no call stack" is a requirement for Karrenberg's algorithm at
least.

2) Apply the "noclone" attribute transitively such that if a function
may transitively call the barrier intrinsic, it is marked "noclone".

Either of these methods allow the user to stop LLVM "breaking their IR.
I'm aware that the general case with no user help (such as force-inlining,
or otherwise controlling function cloning) is a very difficult problem. My
intention is that there are no corner cases *with user assistance*. Currently
there is no way to stop stuff breaking *even with* user assistance! :)

Cheers,

James
________________________________________
From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf
Of Krzysztof Parzyszek [kparzysz at codeaurora.org]
Sent: 01 December 2012 16:22
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [RFC] "noclone" function attribute

On 12/1/2012 10:02 AM, James Molloy wrote:>
> This means that cloning whole functions (CloneFunction and
CloneFunctionInto) will still work [...].
Unfortunately, it won't work.

Assume all threads call foo:

foo() {
   ...
   bar(i)
   ...
}

bar(int i) {
   ...
   barrier();
   ...
}


Now, suppose that we have discovered that bar(0) can be greatly
optimized and generate a call to the specialized version, bar_0:

foo() {
   ...
   if (i == 0) bar_0();
   else        bar(i);
   ...
}


And now we have multiple threads that no longer have a common barrier.


-Krzysztof


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Krzysztof Parzyszek

2012-Dec-01 17:19 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

On 12/1/2012 10:02 AM, James Molloy wrote:>
> I'm proposing a new function attribute, "noclone", with the
semantics that "calls to functions marked "noclone" cannot be
cloned or duplicated into the same function.". That is, it is illegal to
call J = I->clone() then attach J to the same basic block as I if I is marked
"noclone".
The class Loop has something similar in it:

/// isSafeToClone - Return true if the loop body is safe to clone in 
practice.
/// Routines that reform the loop CFG and split edges often fail on 
indirectbr.
bool Loop::isSafeToClone() const {
   // Return false if any loop blocks contain indirectbrs.
   for (Loop::block_iterator I = block_begin(), E = block_end(); I != E; 
++I) {
     if (isa<IndirectBrInst>((*I)->getTerminator()))
       return false;
   }
   return true;
}


Maybe a similar interface could be added to Instruction, and an 
instruction would declare itself unsafe to clone if it was a call to a 
function with the attribute that you are proposing.

I could imagine that this may not the only example of an instruction 
that should not be cloned.


I'd only suggest that the attribute is called something like 
"noclonecalls", or (preferably) something shorter that clarifies that 
it's the calls that shouldn't be cloned.  :)


-Krzysztof


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

James Molloy

2012-Dec-01 17:25 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

> Maybe a similar interface could be added to Instruction, and an 
> instruction would declare itself unsafe to clone if it was a call to a 
> function with the attribute that you are proposing.
I experimented with something similar to this, where Instruction::clone ensured
it wasn't "noclone" - if it was, it asserted. But that broke the
use-case of cloning whole functions.

My patch extends Loop::isSafeToClone to check if a callinst is contained which
is "noclone".

I agree about the naming, but have yet to think of something more snappy :)

Cheers,

James
______________________________________
From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf
Of Krzysztof Parzyszek [kparzysz at codeaurora.org]
Sent: 01 December 2012 17:19
To: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [RFC] "noclone" function attribute

On 12/1/2012 10:02 AM, James Molloy wrote:>
> I'm proposing a new function attribute, "noclone", with the
semantics that "calls to functions marked "noclone" cannot be
cloned or duplicated into the same function.". That is, it is illegal to
call J = I->clone() then attach J to the same basic block as I if I is marked
"noclone".
The class Loop has something similar in it:

/// isSafeToClone - Return true if the loop body is safe to clone in
practice.
/// Routines that reform the loop CFG and split edges often fail on
indirectbr.
bool Loop::isSafeToClone() const {
   // Return false if any loop blocks contain indirectbrs.
   for (Loop::block_iterator I = block_begin(), E = block_end(); I != E;
++I) {
     if (isa<IndirectBrInst>((*I)->getTerminator()))
       return false;
   }
   return true;
}


Maybe a similar interface could be added to Instruction, and an
instruction would declare itself unsafe to clone if it was a call to a
function with the attribute that you are proposing.

I could imagine that this may not the only example of an instruction
that should not be cloned.


I'd only suggest that the attribute is called something like
"noclonecalls", or (preferably) something shorter that clarifies that
it's the calls that shouldn't be cloned.  :)


-Krzysztof


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Kuperstein, Michael M

2012-Dec-02 07:49 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

I definitely support this.

In fact we were about to send a very similar proposal. The main difference I can
see between this proposal and ours was that we named the attribute
"noduplicate".
I graciously defer to James on the bikeshade color issue.

Michael

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of James Molloy
Sent: Saturday, December 01, 2012 18:03
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] [RFC] "noclone" function attribute

Hi,

OpenCL has a "barrier" function with very specific semantics, and
there is currently no analogue to model this in LLVM.

This has been touched on by the SPIR folks but I don't believe they put
forward a proposal.

The barrier function is a special function that ensures that all workitems
executing a kernel have executed up to that point before execution on any
workitem can continue.

The CL spec is specific about how user kernels can use barriers - the sequence
of barriers that are hit by all workitems in a workgroup must be identical. An
issue occurs when defining what "the same barrier" actually means,
however. GPU Hardware, and CPU implementations such as Ralf Karrenberg's
(http://llvm.org/devmtg/2012-04-12/Slides/Ralf_Karrenberg.pdf) key off the PC,
so barrier call A and barrier call B are the same if and only if the PC value at
A and B is the same, for some definition of PC.

Last time this was mentioned, Eli suggested that keying off the PC was a bit
silly - it is my understanding that the next CL spec has "named
barriers" proposed, which give the key to the barrier function explicitly
as a parameter. However even if this is ratified, we (CL vendors) still need to
support the old behaviour of keying off the PC.

This (keying off the PC) has advantages in terms of implementation for the CPU.
For an example, and an example of how this can go wrong, see the end of this
message.

This can go wrong if a barrier call is cloned. This can happen in loop
unrolling, loop unswitching and jump threading, currently. I believe multiple CL
vendors have hacked ad-hoc checking in these three areas currently - it'd be
nice to standardise this and reduce downstream hacks.

I'm proposing a new function attribute, "noclone", with the
semantics that "calls to functions marked "noclone" cannot be
cloned or duplicated into the same function.". That is, it is illegal to
call J = I->clone() then attach J to the same basic block as I if I is marked
"noclone".

This means that cloning whole functions (CloneFunction and CloneFunctionInto)
will still work fine, but CloneBasicBlock with a new parent set equal to the old
parent (i.e. cloning a block in the same function) will assert.

I have a proof of concept patch for this but it's slightly out of date, so
I'll need to update it.

I'm envisaging a large group of people with torches and pitchforks walking
menacingly towards me right now, so without further ado I'll hand over to
them to tell me where I've gone wrong and why the idea is utterly
braindead...

Cheers,

James

EXAMPLE
======
Ralf Karrenberg proposed an algorithm which for a kernel like this:

kernel void k() {
  if (x())
    y();
  barrier();
  if (x())
    z();
  else
    w();
}

split it up into sub-functions and would produce a state machine and a loop
similar to this:

while (1) {
switch (state) {
case STATE_START:
  for (x...) for (y...) for (z...)
    state = kernel_START(x, y, z);
  break;

case STATE_BARRIER1:
  for (x...) for (y...) for (z...)
    state = kernel_BARRIER1(x, y, z);
  break;
  
case STATE_END:
  return;
}
}

where every kernel sub-function (kernel_START and kernel_BARRIER1 in this
example) return a new state.

Notice this relies upon all calls to either kernel_START or kernel_BARRIER1
returning the *same* next state. This is guaranteed by the OpenCL spec.

Let's apply jump threading to that kernel:

kernel void k() {
  if (x()) {
    y();
    barrier();
    z();
  } else {
    barrier();
    w();
  }
}

Oh dear. Now, we'd end up creating a state machine with four states - START,
BARRIER1, BARRIER2 and END. It is no longer guaranteed that all workitems will
hit the same barrier, because we've broken an invariant the user guaranteed.
Our optimisation has been broken.


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Chris Lattner

2012-Dec-03 06:11 UTC

head link

[LLVMdev] [RFC] "noclone" function attribute

On Dec 1, 2012, at 11:49 PM, "Kuperstein, Michael M"
<michael.m.kuperstein at intel.com> wrote:
> I definitely support this.
> 
> In fact we were about to send a very similar proposal. The main difference
I can see between this proposal and ours was that we named the attribute
"noduplicate".
> I graciously defer to James on the bikeshade color issue.
Yes, this sort of functionality is useful.  A few requests though:
1) please name it "noduplicate".  "cloning" has other naming
implications in llvm related to function bodies, but calls to a noduplicate
function should not be duplicated in any way (e.g. tail duplication, loop
unrolling, etc).
2) please have the llvm/Analysis/CodeMetrics.h code consider them to be
unduplicatable (generalizing the containsIndirectBr bit).
3) Please change random parts of the compiler to use CodeMetrics, instead of
scattering random checks for this attribute throughout the code.  Anything
duplicating code and not using CodeMetrics is just plain incorrect.

-Chris

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Dec 2012 - [LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

[LLVMdev] [RFC] "noclone" function attribute

Maybe Matching Threads