search for: get_local_size

Displaying 5 results from an estimated 5 matches for "get_local_size".

2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...t it doesn't work. Thanks. Best regards, 李弘宇 (Li, Hong-Yu) Department of Computer Science & Information Engineering National Taiwan University On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote: > Li, Hong-Yu, > > it's because get_group_id() uses get_local_size > _CLC_DEF size_t get_global_id(uint dim) { > return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); > } > > in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I doubt > that libclc ever supports hsa-runtime before. > > > thanks, >...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output
2011 Jun 01
0
[LLVMdev] Question about debugging dwarf generated by LLVM
Say I have the following program and dwarf line number table. Program: kernel void nbt03( global uint *ip ) { size_t groupSize = get_local_size(0); for (int i=0; i < 256; ++i) { *ip += groupSize; } } Line Number Statements: Extended opcode 2: set Address to 0x38 Special opcode 6: advance Address by 0 to 0x38 and Line by 1 to 2 Set column to 40 Extended opcode 2: set Address to 0x3e Special opcode 6: advance Ad...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...oat) nounwind readonly > > declare float @llvm.exp.f32(float) nounwind readonly > > declare float @llvm.log.f32(float) nounwind readonly > > declare float @fabs(float) > > define void @BAD(%1* noalias nocapture %arg_struct, i32 %get_work_dim, i32* %get_global_size, i32* %get_local_size, i32* %get_group_id) { > entry: > %0 = getelementptr %1* %arg_struct, i64 0, i32 0 > %1 = load float addrspace(1)** %0, align 8 > %2 = getelementptr %1* %arg_struct, i64 0, i32 2 > %3 = load float addrspace(1)** %2, align 8 > %4 = getelementptr %1* %arg_struct, i64 0, i32 3 &...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under