search for: get_local_id

Displaying 20 results from an estimated 30 matches for "get_local_id".

Did you mean: get_local_cid
2011 Oct 10
3
[LLVMdev] Disable Short-Circuit Evaluation?
Is there any way to disable short-circuit evaluation of expressions in Clang/LLVM? Let's say I have C code like the following: bool validX = get_group_id(0) > 32; int globalIndexY0 = get_group_id(1)*186 + 6*get_local_id(1) + 0 + 1; bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910; int globalIndexY1 = get_group_id(1)*186 + 6*get_local_id(1) + 1 + 1; bool valid1 = validX && globalIndexY1 >= 4 && globalIndexY1 < 3910; int globalIndexY2 = get_...
2009 Oct 07
3
[LLVMdev] Instructions that cannot be duplicated
...um_attributes, const int num_objects, __global int* delta_d ) { __local int clusterCount[256]; __local int sTemp[1]; // amd opencl needed this to be an array const unsigned int point_id = get_local_id(0); int index = 0; int i, addr; int xx = get_local_id(0); clusterCount[xx] = 0; if(get_local_id(0) == 0){ sTemp[0] = 0; //sTemp is for prefix sum } barrier(CLK_LOCAL_MEM_FENCE); int idWithinCluster = 300; // anthing other then zero if (point_id < num_...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group_id.exit.thread i32 1, label %get_group_id.exit.thread22 i32 2, label %get_group_id.exit.thread24 ] get_group_id.exit.thread: ; preds = %entry %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13 %x.i12 = tail call i32 @llvm...
2011 Oct 14
2
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...isation, but since there is no way to queue a memory fence across __private memory (only __local and __global), any access to that memory would invoke undefined behaviour. For example, consider the following (2 work-items in a work-group): __kernel void foo() { int x = 0; int *__local p; if (get_local_id(0) == 0) p = &x; barrier(CLK_LOCAL_MEM_FENCE); if (get_local_id(0) == 1) *p = 1; barrier(CLK_LOCAL_MEM_FENCE); // what is the value of x in work-item 0 here? } The value of x at the comment is undefined, because no fence across __private memory was queued. Perhaps more straightforwa...
2011 Oct 10
1
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
Justin Holewinski <justin.holewinski at gmail.com> writes: > int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; > bool valid2       = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; > > Clang, even at -O0, is performing short-circuit evaluation of these > expressions, resulting in a fair number of branch instructions being > generated. It has to. This...
2011 Oct 15
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...way to queue a memory fence across __private memory (only __local and > __global), any access to that memory would invoke undefined behaviour. > For example, consider the following (2 work-items in a work-group): > > __kernel void foo() { > int x = 0; > int *__local p; > if (get_local_id(0) == 0) p = &x; > barrier(CLK_LOCAL_MEM_FENCE); > if (get_local_id(0) == 1) *p = 1; > barrier(CLK_LOCAL_MEM_FENCE); > // what is the value of x in work-item 0 here? > } > > The value of x at the comment is undefined, because no fence across > __private memory was qu...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output
2011 Nov 14
2
[LLVMdev] PTX backend fatal error
...4.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148, 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext from i8> [ID=9] 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]...
2011 Apr 15
0
[LLVMdev] Valid debug information being deleted by DAGCombiner
...nt test case than Micah posted), here's a block of post-optimized IR for a dbg.value() call that the DAGCombiner ends up deleting... %8 = extractelement <4 x i32> %7, i32 0 ; <i32> [#uses=1] br label %9 ; <label>:9 ; preds = %get_local_id.exit %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] br label %11 ; <label>:11 ; preds = %9 %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] br label %get_global_id.exit get_global_id.exit:...
2011 Apr 15
2
[LLVMdev] Valid debug information being deleted by DAGCombiner
...nt test case than Micah posted), here's a block of post-optimized IR for a dbg.value() call that the DAGCombiner ends up deleting... %8 = extractelement <4 x i32> %7, i32 0 ; <i32> [#uses=1] br label %9 ; <label>:9 ; preds = %get_local_id.exit %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] br label %11 ; <label>:11 ; preds = %9 %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] br label %get_global_id.exit get_global_id.exit:...
2011 Oct 10
0
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
10.10.2011, 18:29, "David A. Greene" <greened at obbligato.org>: > Justin Holewinski <justin.holewinski at gmail.com> writes: > >>  int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; >>  bool valid2       = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; >> >>  Clang, even at -O0, is performing short-circuit evaluation of these >>  expressions, resulting in a fair number of branch instructions being >>  gener...
2011 Apr 15
2
[LLVMdev] Valid debug information being deleted by DAGCombiner
On 4/14/11 8:22 PM, Villmow, Micah wrote: > > Found another bitcode file where a debug symbol is being dropped. > > In the attached bitcode file, the variable gid is not in the debug output. > Dumb question: Have you looked to see if mem2reg is destroying (or not maintaining) the debug information of interest (or put another way, was the variable gid promoted to an LLVM register
2011 Apr 15
0
[LLVMdev] Valid debug information being deleted by DAGCombiner
...ah posted), here’s a block of post-optimized IR for a dbg.value() call that the DAGCombiner ends up deleting… > > %8 = extractelement <4 x i32> %7, i32 0 ; <i32> [#uses=1] > br label %9 > > ; <label>:9 ; preds = %get_local_id.exit > %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] > br label %11 > > ; <label>:11 ; preds = %9 > %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] > br label %get_global_id.exit...
2013 Aug 11
0
[LLVMdev] Address space extension
...rough TBAA like metadata is omitted because I think we agree on that part. /// test.cl /// __kernel void convolve(const __global int *input, __constant int *mask, __global int *output) { unsigned x = get_global_id(0); output[x] = input[x] + mask[get_local_id(0)]; } The IR for R600 now is: /// test.r600.ll /// target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-v2048:2048:2048-n32:64...
2011 Nov 14
0
[LLVMdev] PTX backend fatal error
...nd errors. > > I isolated one of them in the following kernel program: > > __kernel void kernel_function(__global int *input) { > __local char localArray[16]; > for(unsigned int index = 0; index < 16; ++index) > localArray[index] = 0; > input[0] = localArray[get_local_id(0)]; > } > > fatal error: error in backend: Cannot select: > 0x5810cc0: i32,ch = load 0x57fa148, > 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext > from i8> [ID=9] > 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] > 0x58109c0: i32 = PTXISD::CO...
2011 Nov 14
1
[LLVMdev] PTX backend fatal error
...ram64.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148, 0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext from i8> [ID=9] 0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8] 0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7] 0x5813...
2013 Aug 10
2
[LLVMdev] Address space extension
> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...", that is, the processed kernel code does not usually have function calls. Well, printf() and some intrisics calls might be exceptions. In such cases the vectorization could be simply not done and the parallelization can be attempted using some other method (e.g. pure unrolling), like usual. get_local_id is converted to regular iteration variables (local id space x, y,z) in the wiloop. I played yesterday a bit by kludge-hacking the LoopVectorizer code to skip the canVectorizeMemory() check for these wiloop constructs and it managed to vectorize a kernel as expected. > You need to implement som...
2011 Apr 15
1
[LLVMdev] Valid debug information being deleted by DAGCombiner
...nt test case than Micah posted), here's a block of post-optimized IR for a dbg.value() call that the DAGCombiner ends up deleting... %8 = extractelement <4 x i32> %7, i32 0 ; <i32> [#uses=1] br label %9 ; <label>:9 ; preds = %get_local_id.exit %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] br label %11 ; <label>:11 ; preds = %9 %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] br label %get_global_id.exit get_global_id.exit:...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...de > does not usually have function calls. Well, printf() and some > intrisics > calls might be exceptions. In such cases the vectorization could be > simply not done and the parallelization can be attempted using some > other > method (e.g. pure unrolling), like usual. > > get_local_id is converted to regular iteration variables (local id > space x, > y,z) in the wiloop. > > I played yesterday a bit by kludge-hacking the LoopVectorizer code to > skip the canVectorizeMemory() check for these wiloop constructs and > it > managed to vectorize a kernel as expect...