thr3ads.net - llvm dev - [llvm-dev] cuda __shfl

If this information is useful, please help other people find it:
Share via:

George K via llvm-dev

2020-Sep-25 08:18 UTC

[llvm-dev] cuda __shfl_sync problem

Do you mean in llc? Because i don't see such an option i'm afraid.

~George

On 24-09-2020 20:54, Johannes Doerfert wrote:> Not that I am an expert but it looks like it defaults to the minimal 
> PTX version that supports the compute capability. You might be able to 
> choose PTX 6.0 though.
>
> ~ Johannes
>
>
> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
>> Hi,
>>
>> First of all, i'm not sure if i should be posting this here or in 
>> cfe-dev, but here it goes.
>>
>> In order to instrument CUDA kernels i first generate device IR with:
>>
>> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 
>> -o device.bc
>>
>> I also have a library that contains the instrumentation stubs for 
>> which i generate IR similarly and i link it with the device IR 
>> programmatically with Linker::linkModules(..)
>>
>> Then after some analysis i use llc to get ptx:
>>
>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx
>>
>> This works fine but the problem is that the instrumentation code uses 
>> __shfl_sync() and ptxas gives me the following error:
>>
>> ptxas device.ptx, line 1033; error   : Feature 'shfl.sync'
requires
>> PTX ISA .version 6.0 or later
>>
>> Now according to 
>>
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>>
>> __shfl_sync is supported by compute capability >= 3 and according to
>> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute 
>> Capability 5.2.
>>
>> Also according to 
>>
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
>> PTX ISA 6.0 does support sm_52.
>>
>> However llc generates:
>>
>> .version 4.1
>> .target sm_52, debug
>> .address_size 64
>>
>> Any ideas why this is happening? Or am i doing something wrong?
>>
>> PS. I'm using CUDA 10, driver 440
>>
>> ~George
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Johannes Doerfert via llvm-dev

2020-Sep-25 15:05 UTC

head link

[llvm-dev] cuda __shfl_sync problem

Have you tried `-target-feature +ptx60`?


On 9/25/20 3:18 AM, George K wrote:> Do you mean in llc? Because i don't see such an option i'm afraid.
>
> ~George
>
> On 24-09-2020 20:54, Johannes Doerfert wrote:
>> Not that I am an expert but it looks like it defaults to the minimal 
>> PTX version that supports the compute capability. You might be able 
>> to choose PTX 6.0 though.
>>
>> ~ Johannes
>>
>>
>> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
>>> Hi,
>>>
>>> First of all, i'm not sure if i should be posting this here or
in
>>> cfe-dev, but here it goes.
>>>
>>> In order to instrument CUDA kernels i first generate device IR
with:
>>>
>>> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52
>>> -o device.bc
>>>
>>> I also have a library that contains the instrumentation stubs for 
>>> which i generate IR similarly and i link it with the device IR 
>>> programmatically with Linker::linkModules(..)
>>>
>>> Then after some analysis i use llc to get ptx:
>>>
>>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o
device.ptx
>>>
>>> This works fine but the problem is that the instrumentation code 
>>> uses __shfl_sync() and ptxas gives me the following error:
>>>
>>> ptxas device.ptx, line 1033; error   : Feature 'shfl.sync'
requires
>>> PTX ISA .version 6.0 or later
>>>
>>> Now according to 
>>>
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>>>
>>> __shfl_sync is supported by compute capability >= 3 and
according to
>>> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has
Compute
>>> Capability 5.2.
>>>
>>> Also according to 
>>>
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
>>> PTX ISA 6.0 does support sm_52.
>>>
>>> However llc generates:
>>>
>>> .version 4.1
>>> .target sm_52, debug
>>> .address_size 64
>>>
>>> Any ideas why this is happening? Or am i doing something wrong?
>>>
>>> PS. I'm using CUDA 10, driver 440
>>>
>>> ~George
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

George K via llvm-dev

2020-Sep-28 12:30 UTC

head link

[llvm-dev] cuda __shfl_sync problem

I couldn't find `-target-feature`. I am on llvm 10. Has the interface 
changed maybe?

Fortunately, `-mattr=+ptx60 ` did the trick.

George

On 25-09-2020 17:05, Johannes Doerfert wrote:> Have you tried `-target-feature +ptx60`?
>
>
> On 9/25/20 3:18 AM, George K wrote:
>> Do you mean in llc? Because i don't see such an option i'm
afraid.
>>
>> ~George
>>
>> On 24-09-2020 20:54, Johannes Doerfert wrote:
>>> Not that I am an expert but it looks like it defaults to the
minimal
>>> PTX version that supports the compute capability. You might be able
>>> to choose PTX 6.0 though.
>>>
>>> ~ Johannes
>>>
>>>
>>> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
>>>> Hi,
>>>>
>>>> First of all, i'm not sure if i should be posting this here
or in
>>>> cfe-dev, but here it goes.
>>>>
>>>> In order to instrument CUDA kernels i first generate device IR
with:
>>>>
>>>> clang++ -x cuda --cuda-device-only -emit-llvm
--cuda-gpu-arch=sm_52
>>>> -o device.bc
>>>>
>>>> I also have a library that contains the instrumentation stubs
for
>>>> which i generate IR similarly and i link it with the device IR 
>>>> programmatically with Linker::linkModules(..)
>>>>
>>>> Then after some analysis i use llc to get ptx:
>>>>
>>>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o 
>>>> device.ptx
>>>>
>>>> This works fine but the problem is that the instrumentation
code
>>>> uses __shfl_sync() and ptxas gives me the following error:
>>>>
>>>> ptxas device.ptx, line 1033; error   : Feature
'shfl.sync' requires
>>>> PTX ISA .version 6.0 or later
>>>>
>>>> Now according to 
>>>>
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>>>>
>>>> __shfl_sync is supported by compute capability >= 3 and
according
>>>> to https://developer.nvidia.com/cuda-gpus#compute my GTX950 has
>>>> Compute Capability 5.2.
>>>>
>>>> Also according to 
>>>>
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
>>>> PTX ISA 6.0 does support sm_52.
>>>>
>>>> However llc generates:
>>>>
>>>> .version 4.1
>>>> .target sm_52, debug
>>>> .address_size 64
>>>>
>>>> Any ideas why this is happening? Or am i doing something wrong?
>>>>
>>>> PS. I'm using CUDA 10, driver 440
>>>>
>>>> ~George
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Sep 2020 - cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem