thr3ads.net - llvm dev - [llvm-dev] cuda __shfl

If this information is useful, please help other people find it:
Share via:

George K via llvm-dev

2020-Sep-24 18:02 UTC

[llvm-dev] cuda __shfl_sync problem

Hi,

First of all, i'm not sure if i should be posting this here or in 
cfe-dev, but here it goes.

In order to instrument CUDA kernels i first generate device IR with:

clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o 
device.bc

I also have a library that contains the instrumentation stubs for which 
i generate IR similarly and i link it with the device IR 
programmatically with Linker::linkModules(..)

Then after some analysis i use llc to get ptx:

llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx

This works fine but the problem is that the instrumentation code uses 
__shfl_sync() and ptxas gives me the following error:

ptxas device.ptx, line 1033; error   : Feature 'shfl.sync' requires PTX 
ISA .version 6.0 or later

Now according to 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,

__shfl_sync is supported by compute capability >= 3 and according to 
https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute 
Capability 5.2.

Also according to 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes 
PTX ISA 6.0 does support sm_52.

However llc generates:

.version 4.1
.target sm_52, debug
.address_size 64

Any ideas why this is happening? Or am i doing something wrong?

PS. I'm using CUDA 10, driver 440

~George

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200924/4fa6c248/attachment.html>

Johannes Doerfert via llvm-dev

2020-Sep-24 18:54 UTC

head link

[llvm-dev] cuda __shfl_sync problem

Not that I am an expert but it looks like it defaults to the minimal PTX 
version that supports the compute capability. You might be able to 
choose PTX 6.0 though.

~ Johannes


On 9/24/20 1:02 PM, George K via llvm-dev wrote:> Hi,
>
> First of all, i'm not sure if i should be posting this here or in 
> cfe-dev, but here it goes.
>
> In order to instrument CUDA kernels i first generate device IR with:
>
> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o 
> device.bc
>
> I also have a library that contains the instrumentation stubs for 
> which i generate IR similarly and i link it with the device IR 
> programmatically with Linker::linkModules(..)
>
> Then after some analysis i use llc to get ptx:
>
> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx
>
> This works fine but the problem is that the instrumentation code uses 
> __shfl_sync() and ptxas gives me the following error:
>
> ptxas device.ptx, line 1033; error   : Feature 'shfl.sync' requires
> PTX ISA .version 6.0 or later
>
> Now according to 
>
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>
> __shfl_sync is supported by compute capability >= 3 and according to 
> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute 
> Capability 5.2.
>
> Also according to 
>
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
> PTX ISA 6.0 does support sm_52.
>
> However llc generates:
>
> .version 4.1
> .target sm_52, debug
> .address_size 64
>
> Any ideas why this is happening? Or am i doing something wrong?
>
> PS. I'm using CUDA 10, driver 440
>
> ~George
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

George K via llvm-dev

2020-Sep-25 08:18 UTC

head link

[llvm-dev] cuda __shfl_sync problem

Do you mean in llc? Because i don't see such an option i'm afraid.

~George

On 24-09-2020 20:54, Johannes Doerfert wrote:> Not that I am an expert but it looks like it defaults to the minimal 
> PTX version that supports the compute capability. You might be able to 
> choose PTX 6.0 though.
>
> ~ Johannes
>
>
> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
>> Hi,
>>
>> First of all, i'm not sure if i should be posting this here or in 
>> cfe-dev, but here it goes.
>>
>> In order to instrument CUDA kernels i first generate device IR with:
>>
>> clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 
>> -o device.bc
>>
>> I also have a library that contains the instrumentation stubs for 
>> which i generate IR similarly and i link it with the device IR 
>> programmatically with Linker::linkModules(..)
>>
>> Then after some analysis i use llc to get ptx:
>>
>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx
>>
>> This works fine but the problem is that the instrumentation code uses 
>> __shfl_sync() and ptxas gives me the following error:
>>
>> ptxas device.ptx, line 1033; error   : Feature 'shfl.sync'
requires
>> PTX ISA .version 6.0 or later
>>
>> Now according to 
>>
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>>
>> __shfl_sync is supported by compute capability >= 3 and according to
>> https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute 
>> Capability 5.2.
>>
>> Also according to 
>>
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
>> PTX ISA 6.0 does support sm_52.
>>
>> However llc generates:
>>
>> .version 4.1
>> .target sm_52, debug
>> .address_size 64
>>
>> Any ideas why this is happening? Or am i doing something wrong?
>>
>> PS. I'm using CUDA 10, driver 440
>>
>> ~George
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Sep 2020 - cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem

[llvm-dev] cuda __shfl_sync problem