thr3ads.net - llvm dev - [llvm-dev] Math functions for CUDA patch [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Artem Belevich via llvm-dev

2018-Jun-15 22:00 UTC

[llvm-dev] Math functions for CUDA patch

+CC: jlebar, llvm-dev@ as this may be of some interest for other users of
NVPTX back-end.

On Thu, Jun 14, 2018 at 9:33 AM Gheorghe-Teod Bercea <
Gheorghe-Teod.Bercea at ibm.com> wrote:
> Hi Artem,
>
> I hope things are well.
>
> Just touching base regarding the patch I posted last week:
> https://reviews.llvm.org/D47849
>
> Based on your expertise of the CUDA toolchain in Clang, are math functions
> for optimizations levels of O1 or higher translated to device functions at
> all?
>
> I have been having a mixed experience with that. On the device side, for
> CUDA, some functions (like pow) will be translated to a device version but
> some functions like sqrt will use the llvm intrinsic version even though an
> nvvm version of the function exists.

I have been trying to leverage the existing CUDA functionality for
OpenMP> device toolchain. I've been able to get OpenMP to do exactly what CUDA
does
> but my question, does CUDA do the right thing by using llvm intrinsics on
> the device side? Or do we perhaps need to fix CUDA too.
>
>AFAICT, clang does not do anything special about translating math library
calls into libdevice calls. We do include CUDA SDK headers that end up
providing device-side overloads for at subset of libm calls. See
include/math_functions.hpp in CUDA SDK.
That maps math functions to __nv_* functions that come with CUDA SDK's
libdevice bitcode. We link in necessary bits of bitcode before passing it
to LLVM.

Those libdevice functions in turn sometimes use NVPTX-specific intrinsics.
E.g. fsqrt has this IR:

define float @__nv_fsqrt_rn(float %x) #0 {> ...
>   %3 = call float @llvm.nvvm.sqrt.rn.ftz.f(float %x)

Then LLVM replaces calls to some of those intrinsics to their LLVM
counterparts:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1466

This way LLVM has ability to reason about these calls and can optimize some
of them.

So, depending on optimizations you may or may not see some of these
transformations and hopefully it explains the inconsistencies you have seen.


> Please let me know your thoughts on this.
>
> Thanks,
>
> --Doru
>
>
>
>
-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/99cdae68/attachment.html>

Justin Lebar via llvm-dev

2018-Jun-15 22:16 UTC

head link

[llvm-dev] Math functions for CUDA patch

In general we try to convert nvvm intrinsics to proper LLVM intrinsics, so
that LLVM can understand what's going on and optimize the code.  There's
a
whole bunch of these in AutoUpgrade.cpp, search for "nvvm".

The llvm/nvvm intrinsics are ultimately translated to the same PTX.

On Fri, Jun 15, 2018 at 3:00 PM Artem Belevich <tra at google.com> wrote:
> +CC: jlebar, llvm-dev@ as this may be of some interest for other users of
> NVPTX back-end.
>
> On Thu, Jun 14, 2018 at 9:33 AM Gheorghe-Teod Bercea <
> Gheorghe-Teod.Bercea at ibm.com> wrote:
>
>> Hi Artem,
>>
>> I hope things are well.
>>
>> Just touching base regarding the patch I posted last week:
>> https://reviews.llvm.org/D47849
>>
>> Based on your expertise of the CUDA toolchain in Clang, are math
>> functions for optimizations levels of O1 or higher translated to device
>> functions at all?
>>
>> I have been having a mixed experience with that. On the device side,
for
>> CUDA, some functions (like pow) will be translated to a device version
but
>> some functions like sqrt will use the llvm intrinsic version even
though an
>> nvvm version of the function exists.
>
>
> I have been trying to leverage the existing CUDA functionality for OpenMP
>> device toolchain. I've been able to get OpenMP to do exactly what
CUDA does
>> but my question, does CUDA do the right thing by using llvm intrinsics
on
>> the device side? Or do we perhaps need to fix CUDA too.
>>
>>
> AFAICT, clang does not do anything special about translating math library
> calls into libdevice calls. We do include CUDA SDK headers that end up
> providing device-side overloads for at subset of libm calls. See
> include/math_functions.hpp in CUDA SDK.
> That maps math functions to __nv_* functions that come with CUDA SDK's
> libdevice bitcode. We link in necessary bits of bitcode before passing it
> to LLVM.
>
> Those libdevice functions in turn sometimes use NVPTX-specific intrinsics.
> E.g. fsqrt has this IR:
>
> define float @__nv_fsqrt_rn(float %x) #0 {
>> ...
>>   %3 = call float @llvm.nvvm.sqrt.rn.ftz.f(float %x)
>
>
> Then LLVM replaces calls to some of those intrinsics to their LLVM
> counterparts:
>
>
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1466
>
> This way LLVM has ability to reason about these calls and can optimize
> some of them.
>
> So, depending on optimizations you may or may not see some of these
> transformations and hopefully it explains the inconsistencies you have
seen.
>
>
>
>> Please let me know your thoughts on this.
>>
>> Thanks,
>>
>> --Doru
>>
>>
>>
>>
>
> --
> --Artem Belevich
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/ccfc8440/attachment.html>

llvm dev - Jun 2018 - Math functions for CUDA patch

[llvm-dev] Math functions for CUDA patch

[llvm-dev] Math functions for CUDA patch