李弘宇 via llvm-dev
2016-Mar-05 17:28 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group_id.exit.thread i32 1, label %get_group_id.exit.thread22 i32 2, label %get_group_id.exit.thread24 ] get_group_id.exit.thread: ; preds = %entry %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13 %x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3 %mul26 = mul i32 %x.i12, %x.i %x.i4 = tail call i32 @llvm.amdgcn.workitem.id.x() #13, !range !1 br label %get_local_id.exit ... } So it shows that some intrinstics are still using llvm.r600.xxx. I have no idea if I ever missed something so that it doesn't work. Thanks. Best regards, 李弘宇 (Li, Hong-Yu) Department of Computer Science & Information Engineering National Taiwan University On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote:> Li, Hong-Yu, > > it's because get_group_id() uses get_local_size > _CLC_DEF size_t get_global_id(uint dim) { > return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); > } > > in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I doubt > that libclc ever supports hsa-runtime before. > > > thanks, > --lx > > > On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org > > wrote: > >> Dear Developers, >> >> I compiled a OpenCL kernel before (on Nov. last year) like >> >> __kernel void g(__global float* array) >> { >> array[get_global_id(0)] = 1; >> } >> >> with libclc, which would originally use the instrinsics like >> llvm.r600.read.local.size.x(). >> >> I executed the generated object file with one version of the hsa-runtime >> [1] provided by Mr. Stellard, when there was more than one workgroup, the >> output of the program wasn't correct at that time. I guessed this might be >> because get_group_id() always returned 1 (not quite sure what was going on >> at that time). >> >> When I compile such cases using current llvm trunk, it uses a set of >> instrinsics starting with llvm.amdgcn, while it still >> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: >> >> define void @g(float addrspace(1)* nocapture %array) #0 { >> %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 >> %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 >> %mul26.i = mul i32 %x.i12.i, %x.i.i >> %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 >> %add.i = add i32 %x.i4.i, %mul26.i >> %0 = sext i32 %add.i to i64 >> %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, >> i64 %0 >> store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa >> !8 >> ret void >> } >> >> which cannot be handled by llc with the message "the non-hsa instrinsic >> with hsa target shown". >> >> After looking into the log (r259297), my question is that is there other >> intrinsic that support this case when the target is amdgcn--amdhsa? In the >> log of r259297, it states that AMDGPUPromoteAlloca pass (a backend pass) >> will generate this intrinsic, but even when I just emit-llvm without going >> through llc, this intrinsic is still emitted. >> >> [1] https://github.com/tstellarAMD/hsa-runtime >> >> >> Regards, >> >> 李弘宇 (Li, Hong-Yu) >> Department of Computer Science & Information Engineering >> National Taiwan University >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/2bb7c16e/attachment-0001.html>
Liu Xin via llvm-dev
2016-Mar-05 17:46 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
I think the "define linkonce_odr i32 @get_global_id(i32 %dim) #5" you dumped is llvm IR after inlining and opt. The commit you mentioned(ba9858) doesn't change get_local_size() at all. I never worked on OpenCL+HSA. I just wonder if libclc supports HSA. HSA RT uses 'hsa_kernel_dispatch_packet_t' to get know workgroup size and grid size. so far, I didn't see hsa-specific implementation appears in libclc. thanks, --lx On Sun, Mar 6, 2016 at 1:28 AM, 李弘宇 <zhenlinospirit at gmail.com> wrote:> Hi Mr. Liu, > > Thanks for your quick reply. > > I compiled the code with the libclc_trunk and linked the bitcode file > under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the > libclc, it is currently using the new workitem intrinsics > (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode > ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code > segment, > > define linkonce_odr i32 @get_global_id(i32 %dim) #5 { > entry: > switch i32 %dim, label %get_local_id.exit [ > i32 0, label %get_group_id.exit.thread > i32 1, label %get_group_id.exit.thread22 > i32 2, label %get_group_id.exit.thread24 > ] > > get_group_id.exit.thread: ; preds = %entry > %x.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #13 > %x.i12 = tail call i32 @llvm.r600.read.local.size.x() #3 > %mul26 = mul i32 %x.i12, %x.i > %x.i4 = tail call i32 @llvm.amdgcn.workitem.id.x() #13, !range !1 > br label %get_local_id.exit > ... > } > > So it shows that some intrinstics are still using llvm.r600.xxx. I have no > idea if I ever missed something so that it doesn't work. > > Thanks. > > Best regards, > > 李弘宇 (Li, Hong-Yu) > Department of Computer Science & Information Engineering > National Taiwan University > > On Sun, Mar 6, 2016 at 12:59 AM, Liu Xin <navy.xliu at gmail.com> wrote: > >> Li, Hong-Yu, >> >> it's because get_group_id() uses get_local_size >> _CLC_DEF size_t get_global_id(uint dim) { >> return get_group_id(dim)*get_local_size(dim) + get_local_id(dim); >> } >> >> in libclc/amdgcn, 'get_local_size' invokes r600-xxx intrinsics. I >> doubt that libclc ever supports hsa-runtime before. >> >> >> thanks, >> --lx >> >> >> On Sun, Mar 6, 2016 at 12:11 AM, 李弘宇 via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Dear Developers, >>> >>> I compiled a OpenCL kernel before (on Nov. last year) like >>> >>> __kernel void g(__global float* array) >>> { >>> array[get_global_id(0)] = 1; >>> } >>> >>> with libclc, which would originally use the instrinsics like >>> llvm.r600.read.local.size.x(). >>> >>> I executed the generated object file with one version of the hsa-runtime >>> [1] provided by Mr. Stellard, when there was more than one workgroup, the >>> output of the program wasn't correct at that time. I guessed this might be >>> because get_group_id() always returned 1 (not quite sure what was going on >>> at that time). >>> >>> When I compile such cases using current llvm trunk, it uses a set of >>> instrinsics starting with llvm.amdgcn, while it still >>> uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: >>> >>> define void @g(float addrspace(1)* nocapture %array) #0 { >>> %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 >>> %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 >>> %mul26.i = mul i32 %x.i12.i, %x.i.i >>> %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 >>> %add.i = add i32 %x.i4.i, %mul26.i >>> %0 = sext i32 %add.i to i64 >>> %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, >>> i64 %0 >>> store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, >>> !tbaa !8 >>> ret void >>> } >>> >>> which cannot be handled by llc with the message "the non-hsa instrinsic >>> with hsa target shown". >>> >>> After looking into the log (r259297), my question is that is there >>> other intrinsic that support this case when the target is amdgcn--amdhsa? >>> In the log of r259297, it states that AMDGPUPromoteAlloca pass (a >>> backend pass) will generate this intrinsic, but even when I just emit-llvm >>> without going through llc, this intrinsic is still emitted. >>> >>> [1] https://github.com/tstellarAMD/hsa-runtime >>> >>> >>> Regards, >>> >>> 李弘宇 (Li, Hong-Yu) >>> Department of Computer Science & Information Engineering >>> National Taiwan University >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160306/8ba213ef/attachment.html>
Matt Arsenault via llvm-dev
2016-Mar-05 22:28 UTC
[llvm-dev] [AMDGPU] non-hsa intrinsic with hsa target
> On Mar 5, 2016, at 09:46, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I never worked on OpenCL+HSA. I just wonder if libclc supports HSA.It does not currently, and only supports Clover’s ABI. The reason there are still r600 related intrinsics being used for workitems is mostly because I haven’t gotten around to fixing it. Clover should be reading these from an offset from the kernel argument pointer rather than having special case intrinsics, or it could start putting arguments somewhere else.> > HSA RT uses 'hsa_kernel_dispatch_packet_t' to get know workgroup size and grid size. so far, I didn't see hsa-specific implementation appears in libclc.The Mesa ABI reads items out of a hidden kernel argument area before the true arguments, while HSA reads from the dispatch packet pointer, which has an intrinsic for it. You can see how these are implemented here: https://bitbucket.org/multicoreware/hcc/src/33432be0ab37668e55f1f534294d7525587518a4/lib/hsail-amdgpu-wrapper.ll?at=master&fileviewer=file-view-default <https://bitbucket.org/multicoreware/hcc/src/33432be0ab37668e55f1f534294d7525587518a4/lib/hsail-amdgpu-wrapper.ll?at=master&fileviewer=file-view-default> -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160305/7ce103ea/attachment.html>