Displaying 5 results from an estimated 5 matches for "get_local_size".
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...t it doesn't work.
Thanks.
Best regards,
李弘宇 (Li, Hong-Yu)
Department of Computer Science & Information Engineering
National Taiwan University
On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote:
> Li, Hong-Yu,
>
> it's because get_group_id() uses get_local_size
> _CLC_DEF size_t get_global_id(uint dim) {
> return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
> }
>
> in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I doubt
> that libclc ever supports hsa-runtime before.
>
>
> thanks,
>...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers,
I compiled a OpenCL kernel before (on Nov. last year) like
__kernel void g(__global float* array)
{
array[get_global_id(0)] = 1;
}
with libclc, which would originally use the instrinsics like
llvm.r600.read.local.size.x().
I executed the generated object file with one version of the hsa-runtime
[1] provided by Mr. Stellard, when there was more than one workgroup, the
output
2011 Jun 01
0
[LLVMdev] Question about debugging dwarf generated by LLVM
Say I have the following program and dwarf line number table.
Program:
kernel void nbt03( global uint *ip )
{
size_t groupSize = get_local_size(0);
for (int i=0; i < 256; ++i) {
*ip += groupSize;
}
}
Line Number Statements:
Extended opcode 2: set Address to 0x38
Special opcode 6: advance Address by 0 to 0x38 and Line by 1 to 2
Set column to 40
Extended opcode 2: set Address to 0x3e
Special opcode 6: advance Ad...
2010 Sep 29
0
[LLVMdev] spilling & xmm register usage
...oat) nounwind readonly
>
> declare float @llvm.exp.f32(float) nounwind readonly
>
> declare float @llvm.log.f32(float) nounwind readonly
>
> declare float @fabs(float)
>
> define void @BAD(%1* noalias nocapture %arg_struct, i32 %get_work_dim, i32* %get_global_size, i32* %get_local_size, i32* %get_group_id) {
> entry:
> %0 = getelementptr %1* %arg_struct, i64 0, i32 0
> %1 = load float addrspace(1)** %0, align 8
> %2 = getelementptr %1* %arg_struct, i64 0, i32 2
> %3 = load float addrspace(1)** %2, align 8
> %4 = getelementptr %1* %arg_struct, i64 0, i32 3
&...
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody,
I have stumbled upon a test case (the attached module is a slightly
reduced version) that shows extremely reduced performance on linux
compared to windows when executed using LLVM's JIT.
We narrowed the problem down to the actual code being generated, the
source IR on both systems is the same.
Try compiling the attached module:
llc -O3 -filetype=asm -o BAD.s BAD.ll
Under