search for: clk_local_mem_f

Displaying 11 results from an estimated 11 matches for "clk_local_mem_f".

2009 Oct 07
3
[LLVMdev] Instructions that cannot be duplicated
...al int sTemp[1]; // amd opencl needed this to be an array const unsigned int point_id = get_local_id(0); int index = 0; int i, addr; int xx = get_local_id(0); clusterCount[xx] = 0; if(get_local_id(0) == 0){ sTemp[0] = 0; //sTemp is for prefix sum } barrier(CLK_LOCAL_MEM_FENCE); int idWithinCluster = 300; // anthing other then zero if (point_id < num_objects) { idWithinCluster = atom_add(&clusterCount [index],1); } barrier(CLK_LOCAL_MEM_FENCE); int numMembers = 2;...
2011 Oct 14
2
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...a memory fence across __private memory (only __local and __global), any access to that memory would invoke undefined behaviour. For example, consider the following (2 work-items in a work-group): __kernel void foo() { int x = 0; int *__local p; if (get_local_id(0) == 0) p = &x; barrier(CLK_LOCAL_MEM_FENCE); if (get_local_id(0) == 1) *p = 1; barrier(CLK_LOCAL_MEM_FENCE); // what is the value of x in work-item 0 here? } The value of x at the comment is undefined, because no fence across __private memory was queued. Perhaps more straightforwardly, referring to the following passage in sec...
2011 Oct 15
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...ory (only __local and > __global), any access to that memory would invoke undefined behaviour. > For example, consider the following (2 work-items in a work-group): > > __kernel void foo() { > int x = 0; > int *__local p; > if (get_local_id(0) == 0) p = &x; > barrier(CLK_LOCAL_MEM_FENCE); > if (get_local_id(0) == 1) *p = 1; > barrier(CLK_LOCAL_MEM_FENCE); > // what is the value of x in work-item 0 here? > } > > The value of x at the comment is undefined, because no fence across > __private memory was queued. > > Perhaps more straightforwardly, ref...
2010 Dec 07
3
[LLVMdev] [cfe-dev] OpenCL support
...hy you need a system-defined intrinsic.) So a kernel function like this: void foo(__global int*A) { __local int vint; __local int *vpint; __local int const *vcpint; __local int volatile vvint; int a = A[0]; vint = a; vvint = a; int a2 = vint; int va2 = vvint; barrier(CLK_LOCAL_MEM_FENCE); A[0] = a2 + va2; } is translated to this, which does pass through Clang, with __local meaning attrib addrspace(2): extern __local void * __get_work_group_local_base_addr(void); // intrinsic void foo(__global int*A) { __local struct __local_vars_s { int vint; int *vpint;...
2010 Dec 09
0
[LLVMdev] [cfe-dev] OpenCL support
...on like this: > > void foo(__global int*A) { > __local int vint; > __local int *vpint; > __local int const *vcpint; > __local int volatile vvint; > int a = A[0]; > vint = a; > vvint = a; > int a2 = vint; > int va2 = vvint; > barrier(CLK_LOCAL_MEM_FENCE); > A[0] = a2 + va2; > } > [Villmow, Micah] This example is incorrect. There is a race condition between the writes to vint and vvint and the reads from vint/vvint. The reason being is that all threads in a work-group share the memory that vint is allocated in. So if you have two w...
2010 Dec 06
0
[LLVMdev] [cfe-dev] OpenCL support
> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Peter Collingbourne > Sent: Monday, December 06, 2010 2:56 PM > To: David Neto > Cc: cfe-dev at cs.uiuc.edu; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] [cfe-dev] OpenCL support > > Hi David, > > On Mon, Dec 06, 2010 at 11:14:42AM -0500,
2010 Dec 06
2
[LLVMdev] [cfe-dev] OpenCL support
Hi David, On Mon, Dec 06, 2010 at 11:14:42AM -0500, David Neto wrote: > >> It > >> seems it would be a good idea to transform the code so that uses of x > >> become loads and stores from memory, and the address for that memory > >> is returned by a builtin function that itself is dependent on work > >> group ids. > >> > >> I'm
2011 Oct 13
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at pcc.me.uk>wrote: > Hi Justin, > > Thanks for bringing this up, I think it's important to discuss > these issues here. > > On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote: > > It is becoming increasingly clear to me that LLVM address spaces are not > the > > general solution
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
...grid_rows-1) && IN_RANGE(loadXidx, 0, grid_cols-1)){ temp_on_cuda[ty][tx] = temp_src[index]; // Load the temperature data from global memory to shared memory power_on_cuda[ty][tx] = power[index];// Load the power data from global memory to shared memory } barrier(CLK_LOCAL_MEM_FENCE); // effective range within this block that falls within // the valid range of the input data // used to rule out computation outside the boundary. int validYmin = (blkY < 0) ? -blkY : 0; int validYmax = (blkYmax > grid_rows-1) ? BLOCK_SIZE-1-(blkYmax-grid_rows+1) : BLOCK_SIZE-1;...
2011 Oct 13
4
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
Hi Justin, Thanks for bringing this up, I think it's important to discuss these issues here. On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote: > It is becoming increasingly clear to me that LLVM address spaces are not the > general solution to OpenCL/CUDA memory spaces. They are a convenient hack to > get things working in the short term, but I think a more
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
If I do M.dump(), at the top of the output I have: %struct.RB = type opaque Further down I have: @.str18 = internal addrspace(2) constant [13 x i8] c"RB_t*\00" However nowhere does it dump the full struct type when I call "M.dump()". I have it explicitly defined above the kernel in the kernel file, but LLVM doesn't seem to pick it up. Opaque is a placeholder until it