Hi, First of all, i'm not sure if i should be posting this here or in cfe-dev, but here it goes. In order to instrument CUDA kernels i first generate device IR with: clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR programmatically with Linker::linkModules(..) Then after some analysis i use llc to get ptx: llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx This works fine but the problem is that the instrumentation code uses __shfl_sync() and ptxas gives me the following error: ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later Now according to https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, __shfl_sync is supported by compute capability >= 3 and according to https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute Capability 5.2. Also according to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes PTX ISA 6.0 does support sm_52. However llc generates: .version 4.1 .target sm_52, debug .address_size 64 Any ideas why this is happening? Or am i doing something wrong? PS. I'm using CUDA 10, driver 440 ~George -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200924/4fa6c248/attachment.html>
Not that I am an expert but it looks like it defaults to the minimal PTX version that supports the compute capability. You might be able to choose PTX 6.0 though. ~ Johannes On 9/24/20 1:02 PM, George K via llvm-dev wrote:> Hi, > > First of all, i'm not sure if i should be posting this here or in > cfe-dev, but here it goes. > > In order to instrument CUDA kernels i first generate device IR with: > > clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o > device.bc > > I also have a library that contains the instrumentation stubs for > which i generate IR similarly and i link it with the device IR > programmatically with Linker::linkModules(..) > > Then after some analysis i use llc to get ptx: > > llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx > > This works fine but the problem is that the instrumentation code uses > __shfl_sync() and ptxas gives me the following error: > > ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires > PTX ISA .version 6.0 or later > > Now according to > https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, > > __shfl_sync is supported by compute capability >= 3 and according to > https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute > Capability 5.2. > > Also according to > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes > PTX ISA 6.0 does support sm_52. > > However llc generates: > > .version 4.1 > .target sm_52, debug > .address_size 64 > > Any ideas why this is happening? Or am i doing something wrong? > > PS. I'm using CUDA 10, driver 440 > > ~George > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Do you mean in llc? Because i don't see such an option i'm afraid. ~George On 24-09-2020 20:54, Johannes Doerfert wrote:> Not that I am an expert but it looks like it defaults to the minimal > PTX version that supports the compute capability. You might be able to > choose PTX 6.0 though. > > ~ Johannes > > > On 9/24/20 1:02 PM, George K via llvm-dev wrote: >> Hi, >> >> First of all, i'm not sure if i should be posting this here or in >> cfe-dev, but here it goes. >> >> In order to instrument CUDA kernels i first generate device IR with: >> >> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 >> -o device.bc >> >> I also have a library that contains the instrumentation stubs for >> which i generate IR similarly and i link it with the device IR >> programmatically with Linker::linkModules(..) >> >> Then after some analysis i use llc to get ptx: >> >> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx >> >> This works fine but the problem is that the instrumentation code uses >> __shfl_sync() and ptxas gives me the following error: >> >> ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires >> PTX ISA .version 6.0 or later >> >> Now according to >> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, >> >> __shfl_sync is supported by compute capability >= 3 and according to >> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute >> Capability 5.2. >> >> Also according to >> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes >> PTX ISA 6.0 does support sm_52. >> >> However llc generates: >> >> .version 4.1 >> .target sm_52, debug >> .address_size 64 >> >> Any ideas why this is happening? Or am i doing something wrong? >> >> PS. I'm using CUDA 10, driver 440 >> >> ~George >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Maybe Matching Threads
- cuda __shfl_sync problem
- [LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it
- [LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it
- RFC: Debug info for Cuda
- Debug info for Cuda