Is there a current way to specify that an instruction or function call cannot be duplicated and thus any optimizations that might want to duplicate this instruction would fail? The problem deals with barrier in OpenCL 1.0. One of the conditions of using barrier is that if a barrier exists inside of control flow, every thread in a work-group must execute the barrier instruction(6.11.9). However, in this simple CL code: #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable __kernel void KMeansMapReduceAtomic(const int num_attributes, const int num_objects, __global int* delta_d ) { __local int clusterCount[256]; __local int sTemp[1]; // amd opencl needed this to be an array const unsigned int point_id = get_local_id(0); int index = 0; int i, addr; int xx = get_local_id(0); clusterCount[xx] = 0; if(get_local_id(0) == 0){ sTemp[0] = 0; //sTemp is for prefix sum } barrier(CLK_LOCAL_MEM_FENCE); int idWithinCluster = 300; // anthing other then zero if (point_id < num_objects) { idWithinCluster = atom_add(&clusterCount [index],1); } barrier(CLK_LOCAL_MEM_FENCE); int numMembers = 2; if(idWithinCluster == 0) { clusterCount[index] atom_add(&sTemp[0], numMembers);//This holds the prefix offset } delta_d[xx] = clusterCount[index]; } produces bitcode file which has 3 barriers. The problem is now that the second if/barrier pair: if (point_id < num_objects) { idWithinCluster = atom_add(&clusterCount [index],1); } barrier(CLK_LOCAL_MEM_FENCE); is transformed into flow control equivalent to : if (point_id >= num_objects) { barrier(CLK_LOCAL_MEM_FENCE); } else { idWithinCluster = atom_add(&clusterCount [index],1); barrier(CLK_LOCAL_MEM_FENCE); } which violates opencl, which can cause undefined behavior on the underlying hardware, as each barrier is unique. So we want to disable all optimizations around barrier instructions, but not in other cases when no barrier instruction exists. One way to do this is to mark an instruction as not being copyable, but is there a method of doing this in LLVM? Also, this barrier does not map to llvm.barrier because llvm.barrier only seems to worry about memory operations and not synchronization between threads. Thanks for any help, Micah -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/5cebd323/attachment.html>
If I may -an unrelated question perhaps- can clang in trunk compile this CL program with the 2 OPENCL EXTENSION and keyword __kernel? thanks shrey On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote:> Is there a current way to specify that an instruction or function call > cannot be duplicated and thus any optimizations that might want to duplicate > this instruction would fail? > > > > The problem deals with barrier in OpenCL 1.0. One of the conditions of using > barrier is that if a barrier exists inside of control flow, every thread in > a work-group must execute the barrier instruction(6.11.9). > > > > However, in this simple CL code: > #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable > > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable > > __kernel void > > KMeansMapReduceAtomic(const int num_attributes, > > const int num_objects, > > __global int* delta_d > > ) > > { > > __local int clusterCount[256]; > > __local int sTemp[1]; // amd opencl needed this to be an > array > > const unsigned int point_id = get_local_id(0); > > int index = 0; > > int i, addr; > > int xx = get_local_id(0); > > clusterCount[xx] = 0; > > if(get_local_id(0) == 0){ > > sTemp[0] = 0; //sTemp is for prefix sum > > } > > barrier(CLK_LOCAL_MEM_FENCE); > > int idWithinCluster = 300; // anthing other then zero > > if (point_id < num_objects) { > > idWithinCluster = atom_add(&clusterCount > > [index],1); > > } > > barrier(CLK_LOCAL_MEM_FENCE); > > int numMembers = 2; > > if(idWithinCluster == 0) { > > clusterCount[index] = atom_add(&sTemp[0], > numMembers);//This holds the prefix offset > > } > > delta_d[xx] = clusterCount[index]; > > } > > > > produces bitcode file which has 3 barriers. > > > > The problem is now that the second if/barrier pair: > > if (point_id < num_objects) { > > idWithinCluster = atom_add(&clusterCount > > [index],1); > > } > > barrier(CLK_LOCAL_MEM_FENCE); > > > > > > is transformed into flow control equivalent to : > > if (point_id >= num_objects) { > > barrier(CLK_LOCAL_MEM_FENCE); > > } else { > > idWithinCluster = atom_add(&clusterCount > > [index],1); > > barrier(CLK_LOCAL_MEM_FENCE); > > } > > > > which violates opencl, which can cause undefined behavior on the underlying > hardware, as each barrier is unique. > > > > So we want to disable all optimizations around barrier instructions, but not > in other cases when no barrier instruction exists. One way to do this is to > mark an instruction as not being copyable, but is there a method of doing > this in LLVM? > > > > Also, this barrier does not map to llvm.barrier because llvm.barrier only > seems to worry about memory operations and not synchronization between > threads. > > > > Thanks for any help, > > Micah > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote:> Is there a current way to specify that an instruction or function call > cannot be duplicated and thus any optimizations that might want to duplicate > this instruction would fail?No. Anything can be duplicated. That could change, but you would need to make a strong case for why other solutions won't work.> which violates opencl, which can cause undefined behavior on the underlying > hardware, as each barrier is unique.If you need to maintain uniqueness, there are other ways to do that without preventing the global from being duplicated. For example, you could associate each barrier with a unique global by making the intrinsic take it as a parameter. -Eli
> -----Original Message----- > From: Eli Friedman [mailto:eli.friedman at gmail.com] > Sent: Wednesday, October 07, 2009 5:50 PM > To: Villmow, Micah > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Instructions that cannot be duplicated > > On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah<Micah.Villmow at amd.com>> wrote: > > Is there a current way to specify that an instruction or function > call > > cannot be duplicated and thus any optimizations that might want to > duplicate > > this instruction would fail? > > No. Anything can be duplicated. That could change, but you would > need to make a strong case for why other solutions won't work.[Villmow, Micah] Well the problem is that the function in question cannot get duplicated because it has side-effects that duplicating causes undefined behavior on vector hardware. Also, moving the instruction inside of flow control when it is originally outside of flow control produces undefined behavior. There currently is no way to specify this in LLVM that I know of. We've tried lowering it to an intrinsic and setting MayWriteMem and this does not solve the problem. After looking at the llvm IR, there is no equivalent method of representing an instruction that is an execution barrier(not a memory barrier, which llvm.barrier.[ss|ll|ls|sl] is). If you have any idea's, we would be willing to give them a try. On the unique barrier issue, even if the barrier is given a unique global identifier, it is the function duplication that causes the problem. A unique global identifier lets us identify that invalid optimizations have occurred, but it does not guarantee correctness since the barrier function is unique per function call. So any sort of duplication is invalid. Micah