Displaying 9 results from an estimated 9 matches for "get_group_id".
2011 Oct 10
3
[LLVMdev] Disable Short-Circuit Evaluation?
Is there any way to disable short-circuit evaluation of expressions in
Clang/LLVM?
Let's say I have C code like the following:
bool validX = get_group_id(0) > 32;
int globalIndexY0 = get_group_id(1)*186 + 6*get_local_id(1) + 0 + 1;
bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910;
int globalIndexY1 = get_group_id(1)*186 + 6*get_local_id(1) + 1 + 1;
bool valid1 = validX && globalIndexY1...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...workitem intrinsics
(commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode
($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code
segment,
define linkonce_odr i32 @get_global_id(i32 %dim) #5 {
entry:
switch i32 %dim, label %get_local_id.exit [
i32 0, label %get_group_id.exit.thread
i32 1, label %get_group_id.exit.thread22
i32 2, label %get_group_id.exit.thread24
]
get_group_id.exit.thread: ; preds = %entry
%x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13
%x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3
%mul26...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...iginally use the instrinsics like
llvm.r600.read.local.size.x().
I executed the generated object file with one version of the hsa-runtime
[1] provided by Mr. Stellard, when there was more than one workgroup, the
output of the program wasn't correct at that time. I guessed this might be
because get_group_id() always returned 1 (not quite sure what was going on
at that time).
When I compile such cases using current llvm trunk, it uses a set of
instrinsics starting with llvm.amdgcn, while it still
uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
define void @g(float addrspace(1)* n...
2011 Oct 10
1
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
Justin Holewinski <justin.holewinski at gmail.com> writes:
> int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1;
> bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910;
>
> Clang, even at -O0, is performing short-circuit evaluation of these
> expressions, resulting in a fair number of branch instructions being
> gener...
2011 Oct 10
0
[LLVMdev] [cfe-dev] Disable Short-Circuit Evaluation?
10.10.2011, 18:29, "David A. Greene" <greened at obbligato.org>:
> Justin Holewinski <justin.holewinski at gmail.com> writes:
>
>> int globalIndexY2 = get_group_id(1)*186 + 6*get_local_id(1) + 2 + 1;
>> bool valid2 = validX && globalIndexY2 >= 4 && globalIndexY2 < 3910;
>>
>> Clang, even at -O0, is performing short-circuit evaluation of these
>> expressions, resulting in a fair number of branch instructio...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...>
> declare float @llvm.exp.f32(float) nounwind readonly
>
> declare float @llvm.log.f32(float) nounwind readonly
>
> declare float @fabs(float)
>
> define void @BAD(%1* noalias nocapture %arg_struct, i32 %get_work_dim, i32* %get_global_size, i32* %get_local_size, i32* %get_group_id) {
> entry:
> %0 = getelementptr %1* %arg_struct, i64 0, i32 0
> %1 = load float addrspace(1)** %0, align 8
> %2 = getelementptr %1* %arg_struct, i64 0, i32 2
> %3 = load float addrspace(1)** %2, align 8
> %4 = getelementptr %1* %arg_struct, i64 0, i32 3
> %5 = load float...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody,
I have stumbled upon a test case (the attached module is a slightly
reduced version) that shows extremely reduced performance on linux
compared to windows when executed using LLVM's JIT.
We narrowed the problem down to the actual code being generated, the
source IR on both systems is the same.
Try compiling the attached module:
llc -O3 -filetype=asm -o BAD.s BAD.ll
Under
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
...global RB_t *cB) {
local float temp_on_cuda[BLOCK_SIZE][BLOCK_SIZE];
local float power_on_cuda[BLOCK_SIZE][BLOCK_SIZE];
local float temp_t[BLOCK_SIZE][BLOCK_SIZE]; // saving temporary temperature result
float amb_temp = 80.0f;
float step_div_Cap;
float Rx_1,Ry_1,Rz_1;
int bx = get_group_id(0);
int by = get_group_id(1);
int tx = get_local_id(0);
int ty = get_local_id(1);
step_div_Cap=step/Cap;
Rx_1=1/Rx;
Ry_1=1/Ry;
Rz_1=1/Rz;
// each block finally computes result for a small block
// after N iterations.
// it is the non-overlapping small blocks that cover
// all the i...
2014 Aug 20
2
[LLVMdev] LLVM CreateStructGEP type assert error
If I do M.dump(), at the top of the output I have:
%struct.RB = type opaque
Further down I have:
@.str18 = internal addrspace(2) constant [13 x i8] c"RB_t*\00"
However nowhere does it dump the full struct type when I call "M.dump()". I have it explicitly defined above the kernel in the kernel file, but LLVM doesn't seem to pick it up.
Opaque is a placeholder until it