search for: get_group_id

Displaying 9 results from an estimated 9 matches for "get_group_id".

2011 Oct 10
3
[LLVMdev] Disable Short-Circuit Evaluation?
Is there any way to disable short-circuit evaluation of expressions in Clang/LLVM? Let's say I have C code like the following: bool validX = get_group_id(0) > 32; int globalIndexY0 = get_group_id(1)*186 + 6*get_local_id(1) + 0 + 1; bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910; int globalIndexY1 = get_group_id(1)*186 + 6*get_local_id(1) + 1 + 1; bool valid1 = validX && globalIndexY1...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group_id.exit.thread i32 1, label %get_group_id.exit.thread22 i32 2, label %get_group_id.exit.thread24 ] get_group_id.exit.thread: ; preds = %entry %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13 %x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3 %mul26...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...iginally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output of the program wasn't correct at that time. I guessed this might be because get_group_id() always returned 1 (not quite sure what was going on at that time). When I compile such cases using current llvm trunk, it uses a set of instrinsics starting with llvm.amdgcn, while it still uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* n...
2011 Oct 10
1
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
Justin Holewinski <justin.holewinski at gmail.com> writes: > int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; > bool valid2       = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; > > Clang, even at -O0, is performing short-circuit evaluation of these > expressions, resulting in a fair number of branch instructions being > gener...
2011 Oct 10
0
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
10.10.2011, 18:29, "David A. Greene" <greened at obbligato.org>: > Justin Holewinski <justin.holewinski at gmail.com> writes: > >>  int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1; >>  bool valid2       = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910; >> >>  Clang, even at -O0, is performing short-circuit evaluation of these >>  expressions, resulting in a fair number of branch instructio...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...> > declare float @llvm.exp.f32(float) nounwind readonly > > declare float @llvm.log.f32(float) nounwind readonly > > declare float @fabs(float) > > define void @BAD(%1* noalias nocapture %arg_struct, i32 %get_work_dim, i32* %get_global_size, i32* %get_local_size, i32* %get_group_id) { > entry: > %0 = getelementptr %1* %arg_struct, i64 0, i32 0 > %1 = load float addrspace(1)** %0, align 8 > %2 = getelementptr %1* %arg_struct, i64 0, i32 2 > %3 = load float addrspace(1)** %2, align 8 > %4 = getelementptr %1* %arg_struct, i64 0, i32 3 > %5 = load float...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
...global RB_t *cB) { local float temp_on_cuda[BLOCK_SIZE][BLOCK_SIZE]; local float power_on_cuda[BLOCK_SIZE][BLOCK_SIZE]; local float temp_t[BLOCK_SIZE][BLOCK_SIZE]; // saving temporary temperature result float amb_temp = 80.0f; float step_div_Cap; float Rx_1,Ry_1,Rz_1; int bx = get_group_id(0); int by = get_group_id(1); int tx = get_local_id(0); int ty = get_local_id(1); step_div_Cap=step/Cap; Rx_1=1/Rx; Ry_1=1/Ry; Rz_1=1/Rz; // each block finally computes result for a small block // after N iterations. // it is the non-overlapping small blocks that cover // all the i...
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
If I do M.dump(), at the top of the output I have: %struct.RB = type opaque Further down I have: @.str18 = internal addrspace(2) constant [13 x i8] c"RB_t*\00" However nowhere does it dump the full struct type when I call "M.dump()". I have it explicitly defined above the kernel in the kernel file, but LLVM doesn't seem to pick it up. Opaque is a placeholder until it