Displaying 11 results from an estimated 11 matches for "clk_local_mem_fence".
2009 Oct 07
3
[LLVMdev] Instructions that cannot be duplicated
...al int sTemp[1]; // amd opencl needed this to be
an array
const unsigned int point_id = get_local_id(0);
int index = 0;
int i, addr;
int xx = get_local_id(0);
clusterCount[xx] = 0;
if(get_local_id(0) == 0){
sTemp[0] = 0; //sTemp is for prefix sum
}
barrier(CLK_LOCAL_MEM_FENCE);
int idWithinCluster = 300; // anthing other then zero
if (point_id < num_objects) {
idWithinCluster = atom_add(&clusterCount
[index],1);
}
barrier(CLK_LOCAL_MEM_FENCE);
int numMembers = 2;
i...
2011 Oct 14
2
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...a memory fence across __private memory (only __local and
__global), any access to that memory would invoke undefined behaviour.
For example, consider the following (2 work-items in a work-group):
__kernel void foo() {
int x = 0;
int *__local p;
if (get_local_id(0) == 0) p = &x;
barrier(CLK_LOCAL_MEM_FENCE);
if (get_local_id(0) == 1) *p = 1;
barrier(CLK_LOCAL_MEM_FENCE);
// what is the value of x in work-item 0 here?
}
The value of x at the comment is undefined, because no fence across
__private memory was queued.
Perhaps more straightforwardly, referring to the following passage
in section...
2011 Oct 15
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...ory (only __local and
> __global), any access to that memory would invoke undefined behaviour.
> For example, consider the following (2 work-items in a work-group):
>
> __kernel void foo() {
> int x = 0;
> int *__local p;
> if (get_local_id(0) == 0) p = &x;
> barrier(CLK_LOCAL_MEM_FENCE);
> if (get_local_id(0) == 1) *p = 1;
> barrier(CLK_LOCAL_MEM_FENCE);
> // what is the value of x in work-item 0 here?
> }
>
> The value of x at the comment is undefined, because no fence across
> __private memory was queued.
>
> Perhaps more straightforwardly, referri...
2010 Dec 07
3
[LLVMdev] [cfe-dev] OpenCL support
...hy you need a system-defined
intrinsic.)
So a kernel function like this:
void foo(__global int*A) {
__local int vint;
__local int *vpint;
__local int const *vcpint;
__local int volatile vvint;
int a = A[0];
vint = a;
vvint = a;
int a2 = vint;
int va2 = vvint;
barrier(CLK_LOCAL_MEM_FENCE);
A[0] = a2 + va2;
}
is translated to this, which does pass through Clang, with __local
meaning attrib addrspace(2):
extern __local void * __get_work_group_local_base_addr(void); // intrinsic
void foo(__global int*A) {
__local struct __local_vars_s {
int vint;
int *vpint;...
2010 Dec 09
0
[LLVMdev] [cfe-dev] OpenCL support
...on like this:
>
> void foo(__global int*A) {
> __local int vint;
> __local int *vpint;
> __local int const *vcpint;
> __local int volatile vvint;
> int a = A[0];
> vint = a;
> vvint = a;
> int a2 = vint;
> int va2 = vvint;
> barrier(CLK_LOCAL_MEM_FENCE);
> A[0] = a2 + va2;
> }
>
[Villmow, Micah] This example is incorrect. There is a race condition between the writes to vint and vvint and the reads from vint/vvint. The reason being is that all threads in a work-group share the memory that vint is allocated in. So if you have two work-...
2010 Dec 06
0
[LLVMdev] [cfe-dev] OpenCL support
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Peter Collingbourne
> Sent: Monday, December 06, 2010 2:56 PM
> To: David Neto
> Cc: cfe-dev at cs.uiuc.edu; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] [cfe-dev] OpenCL support
>
> Hi David,
>
> On Mon, Dec 06, 2010 at 11:14:42AM -0500,
2010 Dec 06
2
[LLVMdev] [cfe-dev] OpenCL support
Hi David,
On Mon, Dec 06, 2010 at 11:14:42AM -0500, David Neto wrote:
> >> It
> >> seems it would be a good idea to transform the code so that uses of x
> >> become loads and stores from memory, and the address for that memory
> >> is returned by a builtin function that itself is dependent on work
> >> group ids.
> >>
> >> I'm
2011 Oct 13
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:
> Hi Justin,
>
> Thanks for bringing this up, I think it's important to discuss
> these issues here.
>
> On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > It is becoming increasingly clear to me that LLVM address spaces are not
> the
> > general solution
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
...grid_rows-1) && IN_RANGE(loadXidx, 0, grid_cols-1)){
temp_on_cuda[ty][tx] = temp_src[index]; // Load the temperature data from global memory to shared memory
power_on_cuda[ty][tx] = power[index];// Load the power data from global memory to shared memory
}
barrier(CLK_LOCAL_MEM_FENCE);
// effective range within this block that falls within
// the valid range of the input data
// used to rule out computation outside the boundary.
int validYmin = (blkY < 0) ? -blkY : 0;
int validYmax = (blkYmax > grid_rows-1) ? BLOCK_SIZE-1-(blkYmax-grid_rows+1) : BLOCK_SIZE-1;
int...
2011 Oct 13
4
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
Hi Justin,
Thanks for bringing this up, I think it's important to discuss
these issues here.
On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> It is becoming increasingly clear to me that LLVM address spaces are not the
> general solution to OpenCL/CUDA memory spaces. They are a convenient hack to
> get things working in the short term, but I think a more
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
If I do M.dump(), at the top of the output I have:
%struct.RB = type opaque
Further down I have:
@.str18 = internal addrspace(2) constant [13 x i8] c"RB_t*\00"
However nowhere does it dump the full struct type when I call "M.dump()". I have it explicitly defined above the kernel in the kernel file, but LLVM doesn't seem to pick it up.
Opaque is a placeholder until it