Do you mean in llc? Because i don't see such an option i'm afraid. ~George On 24-09-2020 20:54, Johannes Doerfert wrote:> Not that I am an expert but it looks like it defaults to the minimal > PTX version that supports the compute capability. You might be able to > choose PTX 6.0 though. > > ~ Johannes > > > On 9/24/20 1:02 PM, George K via llvm-dev wrote: >> Hi, >> >> First of all, i'm not sure if i should be posting this here or in >> cfe-dev, but here it goes. >> >> In order to instrument CUDA kernels i first generate device IR with: >> >> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 >> -o device.bc >> >> I also have a library that contains the instrumentation stubs for >> which i generate IR similarly and i link it with the device IR >> programmatically with Linker::linkModules(..) >> >> Then after some analysis i use llc to get ptx: >> >> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx >> >> This works fine but the problem is that the instrumentation code uses >> __shfl_sync() and ptxas gives me the following error: >> >> ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires >> PTX ISA .version 6.0 or later >> >> Now according to >> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, >> >> __shfl_sync is supported by compute capability >= 3 and according to >> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute >> Capability 5.2. >> >> Also according to >> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes >> PTX ISA 6.0 does support sm_52. >> >> However llc generates: >> >> .version 4.1 >> .target sm_52, debug >> .address_size 64 >> >> Any ideas why this is happening? Or am i doing something wrong? >> >> PS. I'm using CUDA 10, driver 440 >> >> ~George >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Have you tried `-target-feature +ptx60`? On 9/25/20 3:18 AM, George K wrote:> Do you mean in llc? Because i don't see such an option i'm afraid. > > ~George > > On 24-09-2020 20:54, Johannes Doerfert wrote: >> Not that I am an expert but it looks like it defaults to the minimal >> PTX version that supports the compute capability. You might be able >> to choose PTX 6.0 though. >> >> ~ Johannes >> >> >> On 9/24/20 1:02 PM, George K via llvm-dev wrote: >>> Hi, >>> >>> First of all, i'm not sure if i should be posting this here or in >>> cfe-dev, but here it goes. >>> >>> In order to instrument CUDA kernels i first generate device IR with: >>> >>> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 >>> -o device.bc >>> >>> I also have a library that contains the instrumentation stubs for >>> which i generate IR similarly and i link it with the device IR >>> programmatically with Linker::linkModules(..) >>> >>> Then after some analysis i use llc to get ptx: >>> >>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx >>> >>> This works fine but the problem is that the instrumentation code >>> uses __shfl_sync() and ptxas gives me the following error: >>> >>> ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires >>> PTX ISA .version 6.0 or later >>> >>> Now according to >>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, >>> >>> __shfl_sync is supported by compute capability >= 3 and according to >>> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute >>> Capability 5.2. >>> >>> Also according to >>> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes >>> PTX ISA 6.0 does support sm_52. >>> >>> However llc generates: >>> >>> .version 4.1 >>> .target sm_52, debug >>> .address_size 64 >>> >>> Any ideas why this is happening? Or am i doing something wrong? >>> >>> PS. I'm using CUDA 10, driver 440 >>> >>> ~George >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I couldn't find `-target-feature`. I am on llvm 10. Has the interface changed maybe? Fortunately, `-mattr=+ptx60 ` did the trick. George On 25-09-2020 17:05, Johannes Doerfert wrote:> Have you tried `-target-feature +ptx60`? > > > On 9/25/20 3:18 AM, George K wrote: >> Do you mean in llc? Because i don't see such an option i'm afraid. >> >> ~George >> >> On 24-09-2020 20:54, Johannes Doerfert wrote: >>> Not that I am an expert but it looks like it defaults to the minimal >>> PTX version that supports the compute capability. You might be able >>> to choose PTX 6.0 though. >>> >>> ~ Johannes >>> >>> >>> On 9/24/20 1:02 PM, George K via llvm-dev wrote: >>>> Hi, >>>> >>>> First of all, i'm not sure if i should be posting this here or in >>>> cfe-dev, but here it goes. >>>> >>>> In order to instrument CUDA kernels i first generate device IR with: >>>> >>>> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 >>>> -o device.bc >>>> >>>> I also have a library that contains the instrumentation stubs for >>>> which i generate IR similarly and i link it with the device IR >>>> programmatically with Linker::linkModules(..) >>>> >>>> Then after some analysis i use llc to get ptx: >>>> >>>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o >>>> device.ptx >>>> >>>> This works fine but the problem is that the instrumentation code >>>> uses __shfl_sync() and ptxas gives me the following error: >>>> >>>> ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires >>>> PTX ISA .version 6.0 or later >>>> >>>> Now according to >>>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions, >>>> >>>> __shfl_sync is supported by compute capability >= 3 and according >>>> to https://developer.nvidia.com/cuda-gpus#compute my GTX950 has >>>> Compute Capability 5.2. >>>> >>>> Also according to >>>> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes >>>> PTX ISA 6.0 does support sm_52. >>>> >>>> However llc generates: >>>> >>>> .version 4.1 >>>> .target sm_52, debug >>>> .address_size 64 >>>> >>>> Any ideas why this is happening? Or am i doing something wrong? >>>> >>>> PS. I'm using CUDA 10, driver 440 >>>> >>>> ~George >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev