李弘宇 via llvm-dev
2016-Mar-05 16:11 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Dear Developers,
I compiled a OpenCL kernel before (on Nov. last year) like
__kernel void g(__global float* array)
{
array[get_global_id(0)] = 1;
}
with libclc, which would originally use the instrinsics like
llvm.r600.read.local.size.x().
I executed the generated object file with one version of the hsa-runtime
[1] provided by Mr. Stellard, when there was more than one workgroup, the
output of the program wasn't correct at that time. I guessed this might be
because get_group_id() always returned 1 (not quite sure what was going on
at that time).
When I compile such cases using current llvm trunk, it uses a set of
instrinsics starting with llvm.amdgcn, while it still
uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
define void @g(float addrspace(1)* nocapture %array) #0 {
%x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2
%x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1
%mul26.i = mul i32 %x.i12.i, %x.i.i
%x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7
%add.i = add i32 %x.i4.i, %mul26.i
%0 = sext i32 %add.i to i64
%arrayidx = getelementptr inbounds float, float addrspace(1)* %array, i64
%0
store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa !8
ret void
}
which cannot be handled by llc with the message "the non-hsa instrinsic
with hsa target shown".
After looking into the log (r259297), my question is that is there other
intrinsic that support this case when the target is amdgcn--amdhsa? In the
log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass)
will generate this intrinsic, but even when I just emit-llvm without going
through llc, this intrinsic is still emitted.
[1] https://github.com/tstellarAMD/hsa-runtime
Regards,
李弘宇 (Li, Hong-Yu)
Department of Computer Science & Information Engineering
National Taiwan University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/b2fc7564/attachment.html>
Liu Xin via llvm-dev
2016-Mar-05 16:59 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Li, Hong-Yu,
it's because get_group_id() uses get_local_size
_CLC_DEF size_t get_global_id(uint dim) {
return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
}
in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I
doubt
that libclc ever supports hsa-runtime before.
thanks,
--lx
On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at
lists.llvm.org>
wrote:
> Dear Developers,
>
> I compiled a OpenCL kernel before (on Nov. last year) like
>
> __kernel void g(__global float* array)
> {
> array[get_global_id(0)] = 1;
> }
>
> with libclc, which would originally use the instrinsics like
> llvm.r600.read.local.size.x().
>
> I executed the generated object file with one version of the hsa-runtime
> [1] provided by Mr. Stellard, when there was more than one workgroup, the
> output of the program wasn't correct at that time. I guessed this might
be
> because get_group_id() always returned 1 (not quite sure what was going on
> at that time).
>
> When I compile such cases using current llvm trunk, it uses a set of
> instrinsics starting with llvm.amdgcn, while it still
> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
>
> define void @g(float addrspace(1)* nocapture %array) #0 {
> %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2
> %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1
> %mul26.i = mul i32 %x.i12.i, %x.i.i
> %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7
> %add.i = add i32 %x.i4.i, %mul26.i
> %0 = sext i32 %add.i to i64
> %arrayidx = getelementptr inbounds float, float addrspace(1)* %array,
> i64 %0
> store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa
> !8
> ret void
> }
>
> which cannot be handled by llc with the message "the non-hsa
instrinsic
> with hsa target shown".
>
> After looking into the log (r259297), my question is that is there other
> intrinsic that support this case when the target is amdgcn--amdhsa? In the
> log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass)
> will generate this intrinsic, but even when I just emit-llvm without going
> through llc, this intrinsic is still emitted.
>
> [1] https://github.com/tstellarAMD/hsa-runtime
>
>
> Regards,
>
> 李弘宇 (Li, Hong-Yu)
> Department of Computer Science & Information Engineering
> National Taiwan University
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/efd4de1b/attachment.html>
李弘宇 via llvm-dev
2016-Mar-05 17:28 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu,
Thanks for your quick reply.
I compiled the code with the libclc_trunk and linked the bitcode file under
$LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc,
it is currently using the new workitem intrinsics
(commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode
($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code
segment,
define linkonce_odr i32 @get_global_id(i32 %dim) #5 {
entry:
switch i32 %dim, label %get_local_id.exit [
i32 0, label %get_group_id.exit.thread
i32 1, label %get_group_id.exit.thread22
i32 2, label %get_group_id.exit.thread24
]
get_group_id.exit.thread: ; preds = %entry
%x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13
%x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3
%mul26 = mul i32 %x.i12, %x.i
%x.i4 = tail call i32 @llvm.amdgcn.workitem.id.x() #13, !range !1
br label %get_local_id.exit
...
}
So it shows that some intrinstics are still using llvm.r600.xxx. I have no
idea if I ever missed something so that it doesn't work.
Thanks.
Best regards,
李弘宇 (Li, Hong-Yu)
Department of Computer Science & Information Engineering
National Taiwan University
On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote:
> Li, Hong-Yu,
>
> it's because get_group_id() uses get_local_size
> _CLC_DEF size_t get_global_id(uint dim) {
> return get_group_id(dim)*get_local_size(dim) + get_local_id(dim);
> }
>
> in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I
doubt
> that libclc ever supports hsa-runtime before.
>
>
> thanks,
> --lx
>
>
> On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at
lists.llvm.org
> > wrote:
>
>> Dear Developers,
>>
>> I compiled a OpenCL kernel before (on Nov. last year) like
>>
>> __kernel void g(__global float* array)
>> {
>> array[get_global_id(0)] = 1;
>> }
>>
>> with libclc, which would originally use the instrinsics like
>> llvm.r600.read.local.size.x().
>>
>> I executed the generated object file with one version of the
hsa-runtime
>> [1] provided by Mr. Stellard, when there was more than one workgroup,
the
>> output of the program wasn't correct at that time. I guessed this
might be
>> because get_group_id() always returned 1 (not quite sure what was going
on
>> at that time).
>>
>> When I compile such cases using current llvm trunk, it uses a set of
>> instrinsics starting with llvm.amdgcn, while it still
>> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like:
>>
>> define void @g(float addrspace(1)* nocapture %array) #0 {
>> %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2
>> %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1
>> %mul26.i = mul i32 %x.i12.i, %x.i.i
>> %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7
>> %add.i = add i32 %x.i4.i, %mul26.i
>> %0 = sext i32 %add.i to i64
>> %arrayidx = getelementptr inbounds float, float addrspace(1)* %array,
>> i64 %0
>> store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4,
!tbaa
>> !8
>> ret void
>> }
>>
>> which cannot be handled by llc with the message "the non-hsa
instrinsic
>> with hsa target shown".
>>
>> After looking into the log (r259297), my question is that is there
other
>> intrinsic that support this case when the target is amdgcn--amdhsa? In
the
>> log of r259297, it states that AMDGPUPromoteAlloca pass (a backend
pass)
>> will generate this intrinsic, but even when I just emit-llvm without
going
>> through llc, this intrinsic is still emitted.
>>
>> [1] https://github.com/tstellarAMD/hsa-runtime
>>
>>
>> Regards,
>>
>> 李弘宇 (Li, Hong-Yu)
>> Department of Computer Science & Information Engineering
>> National Taiwan University
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/2bb7c16e/attachment-0001.html>