李弘宇 via llvm-dev
2016-Mar-05 16:11 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output of the program wasn't correct at that time. I guessed this might be because get_group_id() always returned 1 (not quite sure what was going on at that time). When I compile such cases using current llvm trunk, it uses a set of instrinsics starting with llvm.amdgcn, while it still uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* nocapture %array) #0 { %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 %mul26.i = mul i32 %x.i12.i, %x.i.i %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 %add.i = add i32 %x.i4.i, %mul26.i %0 = sext i32 %add.i to i64 %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, i64 %0 store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa !8 ret void } which cannot be handled by llc with the message "the non-hsa instrinsic with hsa target shown". After looking into the log (r259297), my question is that is there other intrinsic that support this case when the target is amdgcn--amdhsa? In the log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass) will generate this intrinsic, but even when I just emit-llvm without going through llc, this intrinsic is still emitted. [1] https://github.com/tstellarAMD/hsa-runtime Regards, 李弘宇 (Li, Hong-Yu) Department of Computer Science & Information Engineering National Taiwan University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/b2fc7564/attachment.html>
Liu Xin via llvm-dev
2016-Mar-05 16:59 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Li, Hong-Yu, it's because get_group_id() uses get_local_size _CLC_DEF size_t get_global_id(uint dim) { return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); } in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I doubt that libclc ever supports hsa-runtime before. thanks, --lx On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Dear Developers, > > I compiled a OpenCL kernel before (on Nov. last year) like > > __kernel void g(__global float* array) > { > array[get_global_id(0)] = 1; > } > > with libclc, which would originally use the instrinsics like > llvm.r600.read.local.size.x(). > > I executed the generated object file with one version of the hsa-runtime > [1] provided by Mr. Stellard, when there was more than one workgroup, the > output of the program wasn't correct at that time. I guessed this might be > because get_group_id() always returned 1 (not quite sure what was going on > at that time). > > When I compile such cases using current llvm trunk, it uses a set of > instrinsics starting with llvm.amdgcn, while it still > uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: > > define void @g(float addrspace(1)* nocapture %array) #0 { > %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 > %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 > %mul26.i = mul i32 %x.i12.i, %x.i.i > %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 > %add.i = add i32 %x.i4.i, %mul26.i > %0 = sext i32 %add.i to i64 > %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, > i64 %0 > store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa > !8 > ret void > } > > which cannot be handled by llc with the message "the non-hsa instrinsic > with hsa target shown". > > After looking into the log (r259297), my question is that is there other > intrinsic that support this case when the target is amdgcn--amdhsa? In the > log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass) > will generate this intrinsic, but even when I just emit-llvm without going > through llc, this intrinsic is still emitted. > > [1] https://github.com/tstellarAMD/hsa-runtime > > > Regards, > > 李弘宇 (Li, Hong-Yu) > Department of Computer Science & Information Engineering > National Taiwan University > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/efd4de1b/attachment.html>
李弘宇 via llvm-dev
2016-Mar-05 17:28 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group_id.exit.thread i32 1, label %get_group_id.exit.thread22 i32 2, label %get_group_id.exit.thread24 ] get_group_id.exit.thread: ; preds = %entry %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13 %x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3 %mul26 = mul i32 %x.i12, %x.i %x.i4 = tail call i32 @llvm.amdgcn.workitem.id.x() #13, !range !1 br label %get_local_id.exit ... } So it shows that some intrinstics are still using llvm.r600.xxx. I have no idea if I ever missed something so that it doesn't work. Thanks. Best regards, 李弘宇 (Li, Hong-Yu) Department of Computer Science & Information Engineering National Taiwan University On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote:> Li, Hong-Yu, > > it's because get_group_id() uses get_local_size > _CLC_DEF size_t get_global_id(uint dim) { > return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); > } > > in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I doubt > that libclc ever supports hsa-runtime before. > > > thanks, > --lx > > > On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org > > wrote: > >> Dear Developers, >> >> I compiled a OpenCL kernel before (on Nov. last year) like >> >> __kernel void g(__global float* array) >> { >> array[get_global_id(0)] = 1; >> } >> >> with libclc, which would originally use the instrinsics like >> llvm.r600.read.local.size.x(). >> >> I executed the generated object file with one version of the hsa-runtime >> [1] provided by Mr. Stellard, when there was more than one workgroup, the >> output of the program wasn't correct at that time. I guessed this might be >> because get_group_id() always returned 1 (not quite sure what was going on >> at that time). >> >> When I compile such cases using current llvm trunk, it uses a set of >> instrinsics starting with llvm.amdgcn, while it still >> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: >> >> define void @g(float addrspace(1)* nocapture %array) #0 { >> %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 >> %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 >> %mul26.i = mul i32 %x.i12.i, %x.i.i >> %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 >> %add.i = add i32 %x.i4.i, %mul26.i >> %0 = sext i32 %add.i to i64 >> %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, >> i64 %0 >> store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa >> !8 >> ret void >> } >> >> which cannot be handled by llc with the message "the non-hsa instrinsic >> with hsa target shown". >> >> After looking into the log (r259297), my question is that is there other >> intrinsic that support this case when the target is amdgcn--amdhsa? In the >> log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass) >> will generate this intrinsic, but even when I just emit-llvm without going >> through llc, this intrinsic is still emitted. >> >> [1] https://github.com/tstellarAMD/hsa-runtime >> >> >> Regards, >> >> 李弘宇 (Li, Hong-Yu) >> Department of Computer Science & Information Engineering >> National Taiwan University >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/2bb7c16e/attachment-0001.html>