thr3ads.net - llvm dev - [LLVMdev] Instructions that cannot be duplicated [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Villmow, Micah

2009-Oct-07 18:20 UTC

[LLVMdev] Instructions that cannot be duplicated

Is there a current way to specify that an instruction or function call
cannot be duplicated and thus any optimizations that might want to
duplicate this instruction would fail?

 

The problem deals with barrier in OpenCL 1.0. One of the conditions of
using barrier is that if a barrier exists inside of control flow, every
thread in a work-group must execute the barrier instruction(6.11.9). 

 

However, in this simple CL code:
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable

__kernel void  

KMeansMapReduceAtomic(const int  num_attributes,

                      const int  num_objects,

                      __global int* delta_d

                      ) 

{

                __local int clusterCount[256];

                __local int sTemp[1];  // amd opencl needed this to be
an array

    const unsigned int point_id = get_local_id(0);

   int  index = 0;

   int i, addr;

   int xx = get_local_id(0);

   clusterCount[xx] = 0;

   if(get_local_id(0) == 0){

     sTemp[0] = 0; //sTemp is for prefix sum

     }

   barrier(CLK_LOCAL_MEM_FENCE);

   int idWithinCluster = 300; // anthing other then zero

   if (point_id < num_objects) {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

                }

                  barrier(CLK_LOCAL_MEM_FENCE);

int numMembers = 2; 

                if(idWithinCluster == 0)                 {

                                clusterCount[index] atom_add(&sTemp[0],
numMembers);//This holds the prefix offset

                }

   delta_d[xx] = clusterCount[index];

}

 

produces bitcode file which has 3 barriers.

 

The problem is now that the second if/barrier pair:

if (point_id < num_objects) {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

                }

                  barrier(CLK_LOCAL_MEM_FENCE);

 

 

is transformed into flow control equivalent to :

if (point_id >= num_objects) {

  barrier(CLK_LOCAL_MEM_FENCE);

} else {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

  barrier(CLK_LOCAL_MEM_FENCE);

}

 

which violates opencl, which can cause undefined behavior on the
underlying hardware, as each barrier is unique.  

 

So we want to disable all optimizations around barrier instructions, but
not in other cases when no barrier instruction exists. One way to do
this is to mark an instruction as not being copyable, but is there a
method of doing this in LLVM?

 

Also, this barrier does not map to llvm.barrier because llvm.barrier
only seems to worry about memory operations and not synchronization
between threads.

 

Thanks for any help,

Micah

                

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/5cebd323/attachment.html>

shreyas krishnan

2009-Oct-07 21:57 UTC

head link

[LLVMdev] Instructions that cannot be duplicated

If I may -an unrelated question perhaps- can clang in trunk compile
this CL program with the 2  OPENCL EXTENSION and keyword __kernel?

thanks
shrey



On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com>
wrote:> Is there a current way to specify that an instruction or function call
> cannot be duplicated and thus any optimizations that might want to
duplicate
> this instruction would fail?
>
>
>
> The problem deals with barrier in OpenCL 1.0. One of the conditions of
using
> barrier is that if a barrier exists inside of control flow, every thread in
> a work-group must execute the barrier instruction(6.11.9).
>
>
>
> However, in this simple CL code:
> #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable
>
> #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable
>
> __kernel void
>
> KMeansMapReduceAtomic(const int  num_attributes,
>
>                       const int  num_objects,
>
>                       __global int* delta_d
>
>                       )
>
> {
>
>                 __local int clusterCount[256];
>
>                 __local int sTemp[1];  // amd opencl needed this to be an
> array
>
>     const unsigned int point_id = get_local_id(0);
>
>    int  index = 0;
>
>    int i, addr;
>
>    int xx = get_local_id(0);
>
>    clusterCount[xx] = 0;
>
>    if(get_local_id(0) == 0){
>
>      sTemp[0] = 0; //sTemp is for prefix sum
>
>      }
>
>    barrier(CLK_LOCAL_MEM_FENCE);
>
>    int idWithinCluster = 300; // anthing other then zero
>
>    if (point_id < num_objects) {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>                 }
>
>                   barrier(CLK_LOCAL_MEM_FENCE);
>
> int numMembers = 2;
>
>                 if(idWithinCluster == 0)                 {
>
>                                 clusterCount[index] =
atom_add(&sTemp[0],
> numMembers);//This holds the prefix offset
>
>                 }
>
>    delta_d[xx] = clusterCount[index];
>
> }
>
>
>
> produces bitcode file which has 3 barriers.
>
>
>
> The problem is now that the second if/barrier pair:
>
> if (point_id < num_objects) {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>                 }
>
>                   barrier(CLK_LOCAL_MEM_FENCE);
>
>
>
>
>
> is transformed into flow control equivalent to :
>
> if (point_id >= num_objects) {
>
>   barrier(CLK_LOCAL_MEM_FENCE);
>
> } else {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>   barrier(CLK_LOCAL_MEM_FENCE);
>
> }
>
>
>
> which violates opencl, which can cause undefined behavior on the underlying
> hardware, as each barrier is unique.
>
>
>
> So we want to disable all optimizations around barrier instructions, but
not
> in other cases when no barrier instruction exists. One way to do this is to
> mark an instruction as not being copyable, but is there a method of doing
> this in LLVM?
>
>
>
> Also, this barrier does not map to llvm.barrier because llvm.barrier only
> seems to worry about memory operations and not synchronization between
> threads.
>
>
>
> Thanks for any help,
>
> Micah
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Eli Friedman

2009-Oct-08 00:49 UTC

head link

[LLVMdev] Instructions that cannot be duplicated

On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com>
wrote:> Is there a current way to specify that an instruction or function call
> cannot be duplicated and thus any optimizations that might want to
duplicate
> this instruction would fail?
No.  Anything can be duplicated.  That could change, but you would
need to make a strong case for why other solutions won't work.
> which violates opencl, which can cause undefined behavior on the underlying
> hardware, as each barrier is unique.
If you need to maintain uniqueness, there are other ways to do that
without preventing the global from being duplicated.  For example, you
could associate each barrier with a unique global by making the
intrinsic take it as a parameter.

-Eli

Villmow, Micah

2009-Oct-08 17:49 UTC

head link

[LLVMdev] Instructions that cannot be duplicated

> -----Original Message-----
> From: Eli Friedman [mailto:eli.friedman at gmail.com]
> Sent: Wednesday, October 07, 2009 5:50 PM
> To: Villmow, Micah
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Instructions that cannot be duplicated
> 
> On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah
<Micah.Villmow at amd.com>> wrote:
> > Is there a current way to specify that an instruction or function
> call
> > cannot be duplicated and thus any optimizations that might want to
> duplicate
> > this instruction would fail?
> 
> No.  Anything can be duplicated.  That could change, but you would
> need to make a strong case for why other solutions won't work.[Villmow, Micah] Well the problem is that the function in question
cannot get duplicated because it has side-effects that duplicating
causes undefined behavior on vector hardware. Also, moving the
instruction inside of flow control when it is originally outside of flow
control produces undefined behavior. There currently is no way to
specify this in LLVM that I know of. We've tried lowering it to an
intrinsic and setting MayWriteMem and this does not solve the problem.
After looking at the llvm IR, there is no equivalent method of
representing an instruction that is an execution barrier(not a memory
barrier, which llvm.barrier.[ss|ll|ls|sl] is). If you have any idea's,
we would be willing to give them a try.


On the unique barrier issue, even if the barrier is given a unique
global identifier, it is the function duplication that causes the
problem. A unique global identifier lets us identify that invalid
optimizations have occurred, but it does not guarantee correctness since
the barrier function is unique per function call. So any sort of
duplication is invalid.
Micah

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2009 - [LLVMdev] Instructions that cannot be duplicated

[LLVMdev] Instructions that cannot be duplicated

[LLVMdev] Instructions that cannot be duplicated

[LLVMdev] Instructions that cannot be duplicated

[LLVMdev] Instructions that cannot be duplicated

Possibly Parallel Threads