thr3ads.net - llvm dev - [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Dmitry Mikushin

2013-Feb-17 14:48 UTC

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Hi Justin,

I don't understand, why, for instance, X86 backend handles pow
automatically, and NVPTX should be a PITA requiring user to bring his own
pow implementation. Even at a very general level, this limits the interest
of users to LLVM NVPTX backend. Could you please elaborate on the rationale
behind your point? Why the accuracy modes I suggested are not sufficient,
in your opinion?

- D.

2013/2/17 Justin Holewinski <justin.holewinski at gmail.com>
> I would be very hesitant to expose all math library functions as
> intrinsics.  I believe linking with a target-specific math library is the
> correct approach, as it decouples the back end from the needs of the source
> program/language.  Users should be free to use any math library
> implementation they choose.  Intrinsics are meant for functions that
> compile down to specific isa features, like fused multiply add and square
> root.
> On Feb 16, 2013 8:46 PM, "Dmitry Mikushin" <dmitry at
kernelgen.org> wrote:
>
>> Dear Yuan,
>>
>> Sorry for delay with reply,
>>
>> Answers on your questions could be different, depending on the math
>> library placement in the code generation pipeline. At KernelGen, we
>> currently have a user-level CUDA math module, adopted from cicc
internals
>> [1]. It is intended to be linked with the user LLVM IR module, right
before
>> proceeding with the final optimization and backend. Last few months we
are
>> using this method to temporary workaround the absence of many math
>> functions, to keep up the speed of applications testing in our compiler
>> test suite. Supplying math in such way is not portable and introduces
many
>> issues, for instance:
>> 1) The frontend (DragonEgg - in our case) must be taught to emit real
>> math functions calls instead those of LLVM intrinsics, NVPTX cannot
handle
>> 2) However, not all intrinsics should be replaced by math calls
directly,
>> for example, there is not cdexp call, but it could be modelled with
sincos.
>> 3) Our math module assumes sm_20, and could be inefficient or
>> non-portable on other families of GPUs.
>>
>> Instead of this approach, I think math library should be implemented
*as
>> a lowering pass in backend*, working directly with intrinsics. In this
>> case - naming is not important, as well as final optimization is the
job of
>> backend. But there is another important thing: backend should codegen
math
>> with respect to accuracy settings, specified either as backend options,
or
>> as functions attributes (quiet recent addition of LLVM). Accuracy
settings
>> should be:
>> 1) fast-math (ftz, prec-div, prec-sqrt, fma, etc.)
>> 2) Use or not GPU-specific low-precision functions (__sin, __cos, etc.)
>>
>> Following latter approach, math handling of NVPTX will conform the rest
>> of LLVM, and no host-dependant tweaks will be needed.
>>
>> I'm also interested to contribute into this developments at
reasonable
>> depth. Moving this part only on our own would slow down the progess
with
>> main targets too much, that's why I'm asking for your help and
cooperation.
>>
>> Best regards,
>> - Dima.
>>
>> [1]
>>
https://hpcforge.org/scm/viewvc.php/*checkout*/trunk/src/cuda/include/math.bc?root=kernelgen
>>
>> 2013/2/8 Yuan Lin <yulin at nvidia.com>
>>
>>> Yes, it helps a lot and we are working on it.****
>>>
>>> ** **
>>>
>>> A few questions,****
>>>
>>> **1)      **What will be your use model of this library? Will you
run
>>> optimization phases after linking with the library? If so, what are
they?
>>> ****
>>>
>>> **2)      **Do you care if the names of functions differ from those
in
>>> libm? For example, it would be gpusin() instead of sin(). ****
>>>
>>> **3)      **Do you need a different library for different host
>>> platforms? Why?****
>>>
>>> **4)      **Any other functions (besides math) you want to see in
this
>>> library?****
>>>
>>> ** **
>>>
>>> Thanks.****
>>>
>>> ** **
>>>
>>> Yuan****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* Dmitry Mikushin [mailto:dmitry at kernelgen.org]
>>> *Sent:* Thursday, February 07, 2013 2:09 PM
>>> *To:* Justin Holewinski; LLVM Developers Mailing List
>>> *Cc:* Yuan Lin
>>> *Subject:* [NVPTX] We need an LLVM CUDA math library, after all****
>>>
>>> ** **
>>>
>>> Hi Justin, gentlemen,
>>>
>>> I'm afraid I have to escalate this issue at this point. Since
it was
>>> discussed for the first time last summer, it was sufficient for us
for a
>>> while to have lowering of math calls into intrinsics disabled at
DragonEgg
>>> level, and link them against CUDA math functions at LLVM IR level.
Now I
>>> can say: this is not sufficient any longer, and we need NVPTX
backend to
>>> deal with GPU math.
>>>
>>> > There also is no standard libm for PTX.
>>>
>>> Yes, that's right, but there is an interesting idea to codegen
CUDA math
>>> headers into LLVM IR and link it with user module at IR level. This
method
>>> gives a perfect degree of flexibility with respect to high-level
languages:
>>> the user no longer needs to deal with headers and can have math
right in
>>> the IR, regardless the language it was lowered from. I can confirm
this
>>> method works for us very well with C and Fortran, but in order to
make
>>> accurate replacements of unsupported intrinsics calls, it needs to
become
>>> aware of NVPTX backend capabilities in the form of:
>>>
>>> bool NVPTXTargetMachine::****
>>>
>>> isIntrinsicSupported(Function& intrinsic) and
>>> string
NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
>>> intrinsic)
>>>
>>> > I would prefer not to lower such things in the back-end since
>>> different compilers may want to implement such functions
differently based
>>> on speed vs. accuracy trade-offs.
>>>
>>> Who are those different compilers? We are LLVM, the complete
compiler
>>> stack, which should handle these things on its specific preference.
Derived
>>> compilers may certainly think different, and it's their own
business to
>>> change anything they want and never contribute back. We should not
forget
>>> there are a lot of derived projects that use LLVM directly, like
KernelGen
>>> or many of those embedded DSLs recently started flourishing. Their
>>> completeness and future relies on LLVM. For these reasons, I would
strongly
>>> prefer LLVM/NVPTX should supply a reference GPU math implementation
and
>>> invite you and everyone else to form a joint roadmap to deliver it.
>>>
>>> Before we started, IANAL, but something tells me there could be a
>>> licensing issue about releasing the LLVM IR emitted from CUDA
headers.
>>> Could you please check this with NVIDIA?
>>>
>>> Many thanks,
>>> - D.
>>>
>>> 2012/9/6 Justin Holewinski <justin.holewinski at gmail.com>:
>>> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>> >>
>>> >> Dear all,
>>> >>
>>> >> During app compilation we have a crash in NVPTX backend:
>>> >>
>>> >> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
>>> >> [ID=18]
>>> >>
>>> >> As I understand LLVM tries to lower the following call
>>> >>
>>> >> %28 = call ptx_device float @llvm.powi.f32(float
2.000000e+00, i32 %8)
>>> >> nounwind readonly
>>> >>
>>> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td does
not contain
>>> >> such intrinsic, however it should be builtin, according to
>>> >> cuda/include/math_functions.h
>>> >
>>> >
>>> > It actually gets lowered into an external function call.
>>> >
>>> >
>>> >>
>>> >> Is my understanding correct, and we need simply add the
corresponding
>>> >> definition to llvm/IntrinsicsNVVM.td ? How to do that,
what are the
>>> >> rules?
>>> >
>>> >
>>> > PTX does not have an instruction (or simple series of
instructions)
>>> that
>>> > implements pow, so this will not be handled.  I would prefer
not to
>>> lower
>>> > such things in the back-end since different compilers may want
to
>>> implement
>>> > such functions differently based on speed vs. accuracy
trade-offs.
>>> >
>>> > There also is no standard libm for PTX.  It is up to the
higher-level
>>> > compiler to link against a run-time library that provides
functions
>>> like pow
>>> > (see include/math_functions.h in a CUDA distribution).
>>> >
>>> >>
>>> >> Thanks,
>>> >> - D.
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >****
>>>
>>> ****
>>>
>>> >
>>> > --
>>> > Thanks,
>>> >
>>> > Justin Holewinski
>>> >****
>>>  ------------------------------
>>>  This email message is for the sole use of the intended
recipient(s)
>>> and may contain confidential information.  Any unauthorized review,
use,
>>> disclosure or distribution is prohibited.  If you are not the
intended
>>> recipient, please contact the sender by reply email and destroy all
copies
>>> of the original message.
>>>  ------------------------------
>>>
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/907ddef1/attachment.html>

Justin Holewinski

2013-Feb-17 15:01 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

The X86 back-end just calls into libm:

  // Always use a library call for pow.
  setOperationAction(ISD::FPOW             , MVT::f32  , Expand);
  setOperationAction(ISD::FPOW             , MVT::f64  , Expand);
  setOperationAction(ISD::FPOW             , MVT::f80  , Expand);


The issue is really that there is no standard math library for PTX.  I
agree that this is a pain for most users, but I don't think the right
solution is to embed a whole suite of math functions into the back-end.
 All I'm suggesting is that we instead follow the path of linking in an
external math library of target-specific functions.  Whether you link your
IR with a bitcode library before codegen or have codegen emit library
function calls is an implementation detail, with each having advantages.
 The accuracy modes can be used to pick the proper library function in the
latter case, but I still think library function choice is better left up to
the front-end, and the accuracy attributes are a better fit to drive
optimization.


On Sun, Feb 17, 2013 at 9:48 AM, Dmitry Mikushin <dmitry at
kernelgen.org>wrote:
> Hi Justin,
>
> I don't understand, why, for instance, X86 backend handles pow
> automatically, and NVPTX should be a PITA requiring user to bring his own
> pow implementation. Even at a very general level, this limits the interest
> of users to LLVM NVPTX backend. Could you please elaborate on the rationale
> behind your point? Why the accuracy modes I suggested are not sufficient,
> in your opinion?
>
> - D.
>
>
> 2013/2/17 Justin Holewinski <justin.holewinski at gmail.com>
>
>> I would be very hesitant to expose all math library functions as
>> intrinsics.  I believe linking with a target-specific math library is
the
>> correct approach, as it decouples the back end from the needs of the
source
>> program/language.  Users should be free to use any math library
>> implementation they choose.  Intrinsics are meant for functions that
>> compile down to specific isa features, like fused multiply add and
square
>> root.
>>  On Feb 16, 2013 8:46 PM, "Dmitry Mikushin" <dmitry at
kernelgen.org> wrote:
>>
>>> Dear Yuan,
>>>
>>> Sorry for delay with reply,
>>>
>>> Answers on your questions could be different, depending on the math
>>> library placement in the code generation pipeline. At KernelGen, we
>>> currently have a user-level CUDA math module, adopted from cicc
internals
>>> [1]. It is intended to be linked with the user LLVM IR module,
right before
>>> proceeding with the final optimization and backend. Last few months
we are
>>> using this method to temporary workaround the absence of many math
>>> functions, to keep up the speed of applications testing in our
compiler
>>> test suite. Supplying math in such way is not portable and
introduces many
>>> issues, for instance:
>>> 1) The frontend (DragonEgg - in our case) must be taught to emit
real
>>> math functions calls instead those of LLVM intrinsics, NVPTX cannot
handle
>>> 2) However, not all intrinsics should be replaced by math calls
>>> directly, for example, there is not cdexp call, but it could be
modelled
>>> with sincos.
>>> 3) Our math module assumes sm_20, and could be inefficient or
>>> non-portable on other families of GPUs.
>>>
>>> Instead of this approach, I think math library should be
implemented *as
>>> a lowering pass in backend*, working directly with intrinsics. In
this
>>> case - naming is not important, as well as final optimization is
the job of
>>> backend. But there is another important thing: backend should
codegen math
>>> with respect to accuracy settings, specified either as backend
options, or
>>> as functions attributes (quiet recent addition of LLVM). Accuracy
settings
>>> should be:
>>> 1) fast-math (ftz, prec-div, prec-sqrt, fma, etc.)
>>> 2) Use or not GPU-specific low-precision functions (__sin, __cos,
etc.)
>>>
>>> Following latter approach, math handling of NVPTX will conform the
rest
>>> of LLVM, and no host-dependant tweaks will be needed.
>>>
>>> I'm also interested to contribute into this developments at
reasonable
>>> depth. Moving this part only on our own would slow down the progess
with
>>> main targets too much, that's why I'm asking for your help
and cooperation.
>>>
>>> Best regards,
>>> - Dima.
>>>
>>> [1]
>>>
https://hpcforge.org/scm/viewvc.php/*checkout*/trunk/src/cuda/include/math.bc?root=kernelgen
>>>
>>> 2013/2/8 Yuan Lin <yulin at nvidia.com>
>>>
>>>> Yes, it helps a lot and we are working on it.****
>>>>
>>>> ** **
>>>>
>>>> A few questions,****
>>>>
>>>> **1)      **What will be your use model of this library? Will
you run
>>>> optimization phases after linking with the library? If so, what
are they?
>>>> ****
>>>>
>>>> **2)      **Do you care if the names of functions differ from
those in
>>>> libm? For example, it would be gpusin() instead of sin(). ****
>>>>
>>>> **3)      **Do you need a different library for different host
>>>> platforms? Why?****
>>>>
>>>> **4)      **Any other functions (besides math) you want to see
in this
>>>> library?****
>>>>
>>>> ** **
>>>>
>>>> Thanks.****
>>>>
>>>> ** **
>>>>
>>>> Yuan****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> *From:* Dmitry Mikushin [mailto:dmitry at kernelgen.org]
>>>> *Sent:* Thursday, February 07, 2013 2:09 PM
>>>> *To:* Justin Holewinski; LLVM Developers Mailing List
>>>> *Cc:* Yuan Lin
>>>> *Subject:* [NVPTX] We need an LLVM CUDA math library, after
all****
>>>>
>>>> ** **
>>>>
>>>> Hi Justin, gentlemen,
>>>>
>>>> I'm afraid I have to escalate this issue at this point.
Since it was
>>>> discussed for the first time last summer, it was sufficient for
us for a
>>>> while to have lowering of math calls into intrinsics disabled
at DragonEgg
>>>> level, and link them against CUDA math functions at LLVM IR
level. Now I
>>>> can say: this is not sufficient any longer, and we need NVPTX
backend to
>>>> deal with GPU math.
>>>>
>>>> > There also is no standard libm for PTX.
>>>>
>>>> Yes, that's right, but there is an interesting idea to
codegen CUDA
>>>> math headers into LLVM IR and link it with user module at IR
level. This
>>>> method gives a perfect degree of flexibility with respect to
high-level
>>>> languages: the user no longer needs to deal with headers and
can have math
>>>> right in the IR, regardless the language it was lowered from. I
can confirm
>>>> this method works for us very well with C and Fortran, but in
order to make
>>>> accurate replacements of unsupported intrinsics calls, it needs
to become
>>>> aware of NVPTX backend capabilities in the form of:
>>>>
>>>> bool NVPTXTargetMachine::****
>>>>
>>>> isIntrinsicSupported(Function& intrinsic) and
>>>> string
NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
>>>> intrinsic)
>>>>
>>>> > I would prefer not to lower such things in the back-end
since
>>>> different compilers may want to implement such functions
differently based
>>>> on speed vs. accuracy trade-offs.
>>>>
>>>> Who are those different compilers? We are LLVM, the complete
compiler
>>>> stack, which should handle these things on its specific
preference. Derived
>>>> compilers may certainly think different, and it's their own
business to
>>>> change anything they want and never contribute back. We should
not forget
>>>> there are a lot of derived projects that use LLVM directly,
like KernelGen
>>>> or many of those embedded DSLs recently started flourishing.
Their
>>>> completeness and future relies on LLVM. For these reasons, I
would strongly
>>>> prefer LLVM/NVPTX should supply a reference GPU math
implementation and
>>>> invite you and everyone else to form a joint roadmap to deliver
it.
>>>>
>>>> Before we started, IANAL, but something tells me there could be
a
>>>> licensing issue about releasing the LLVM IR emitted from CUDA
headers.
>>>> Could you please check this with NVIDIA?
>>>>
>>>> Many thanks,
>>>> - D.
>>>>
>>>> 2012/9/6 Justin Holewinski <justin.holewinski at
gmail.com>:
>>>> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>>> >>
>>>> >> Dear all,
>>>> >>
>>>> >> During app compilation we have a crash in NVPTX
backend:
>>>> >>
>>>> >> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
>>>> >> [ID=18]
>>>> >>
>>>> >> As I understand LLVM tries to lower the following call
>>>> >>
>>>> >> %28 = call ptx_device float @llvm.powi.f32(float
2.000000e+00, i32
>>>> %8)
>>>> >> nounwind readonly
>>>> >>
>>>> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td
does not
>>>> contain
>>>> >> such intrinsic, however it should be builtin,
according to
>>>> >> cuda/include/math_functions.h
>>>> >
>>>> >
>>>> > It actually gets lowered into an external function call.
>>>> >
>>>> >
>>>> >>
>>>> >> Is my understanding correct, and we need simply add
the corresponding
>>>> >> definition to llvm/IntrinsicsNVVM.td ? How to do that,
what are the
>>>> >> rules?
>>>> >
>>>> >
>>>> > PTX does not have an instruction (or simple series of
instructions)
>>>> that
>>>> > implements pow, so this will not be handled.  I would
prefer not to
>>>> lower
>>>> > such things in the back-end since different compilers may
want to
>>>> implement
>>>> > such functions differently based on speed vs. accuracy
trade-offs.
>>>> >
>>>> > There also is no standard libm for PTX.  It is up to the
higher-level
>>>> > compiler to link against a run-time library that provides
functions
>>>> like pow
>>>> > (see include/math_functions.h in a CUDA distribution).
>>>> >
>>>> >>
>>>> >> Thanks,
>>>> >> - D.
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >****
>>>>
>>>> ****
>>>>
>>>> >
>>>> > --
>>>> > Thanks,
>>>> >
>>>> > Justin Holewinski
>>>> >****
>>>>  ------------------------------
>>>>  This email message is for the sole use of the intended
recipient(s)
>>>> and may contain confidential information.  Any unauthorized
review, use,
>>>> disclosure or distribution is prohibited.  If you are not the
intended
>>>> recipient, please contact the sender by reply email and destroy
all copies
>>>> of the original message.
>>>>  ------------------------------
>>>>
>>>>
>>>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/2b739799/attachment.html>

Dmitry Mikushin

2013-Feb-17 17:24 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

> The issue is really that there is no standard math library for PTX.
Well, formally, that could very well be true. Moreover, in some parts CPU
math standard is impossible to accomplish on parallel architectures,
consider, for example errno behavior. But here we are speaking more about
practical side. And the practical side is: past 5 years CUDA claims to
accelerate compute applications, and it implies having good math support.
For clearance, we can drop term "LLVM CUDA math library" and instead
speak
of the need to have for entire LLVM "the same degree of math support"
CUDA
currently has for C/C++.

If you think having math module outside of backend is more feasible, this
is also a way to go, but please see what we need in this case in the first
email: anyways, NVPTX backend will have to tell us, which intrinsics he is
going to lower, and which ones will make him to crash. So, there is need to
modify something in the backend, anyways.

- D.

2013/2/17 Justin Holewinski <justin.holewinski at gmail.com>
> The X86 back-end just calls into libm:
>
>   // Always use a library call for pow.
>   setOperationAction(ISD::FPOW             , MVT::f32  , Expand);
>   setOperationAction(ISD::FPOW             , MVT::f64  , Expand);
>   setOperationAction(ISD::FPOW             , MVT::f80  , Expand);
>
>
> The issue is really that there is no standard math library for PTX.  I
> agree that this is a pain for most users, but I don't think the right
> solution is to embed a whole suite of math functions into the back-end.
>  All I'm suggesting is that we instead follow the path of linking in an
> external math library of target-specific functions.  Whether you link your
> IR with a bitcode library before codegen or have codegen emit library
> function calls is an implementation detail, with each having advantages.
>  The accuracy modes can be used to pick the proper library function in the
> latter case, but I still think library function choice is better left up to
> the front-end, and the accuracy attributes are a better fit to drive
> optimization.
>
>
> On Sun, Feb 17, 2013 at 9:48 AM, Dmitry Mikushin <dmitry at
kernelgen.org>wrote:
>
>> Hi Justin,
>>
>> I don't understand, why, for instance, X86 backend handles pow
>> automatically, and NVPTX should be a PITA requiring user to bring his
own
>> pow implementation. Even at a very general level, this limits the
interest
>> of users to LLVM NVPTX backend. Could you please elaborate on the
rationale
>> behind your point? Why the accuracy modes I suggested are not
sufficient,
>> in your opinion?
>>
>> - D.
>>
>>
>> 2013/2/17 Justin Holewinski <justin.holewinski at gmail.com>
>>
>>> I would be very hesitant to expose all math library functions as
>>> intrinsics.  I believe linking with a target-specific math library
is the
>>> correct approach, as it decouples the back end from the needs of
the source
>>> program/language.  Users should be free to use any math library
>>> implementation they choose.  Intrinsics are meant for functions
that
>>> compile down to specific isa features, like fused multiply add and
square
>>> root.
>>>  On Feb 16, 2013 8:46 PM, "Dmitry Mikushin" <dmitry at
kernelgen.org>
>>> wrote:
>>>
>>>> Dear Yuan,
>>>>
>>>> Sorry for delay with reply,
>>>>
>>>> Answers on your questions could be different, depending on the
math
>>>> library placement in the code generation pipeline. At
KernelGen, we
>>>> currently have a user-level CUDA math module, adopted from cicc
internals
>>>> [1]. It is intended to be linked with the user LLVM IR module,
right before
>>>> proceeding with the final optimization and backend. Last few
months we are
>>>> using this method to temporary workaround the absence of many
math
>>>> functions, to keep up the speed of applications testing in our
compiler
>>>> test suite. Supplying math in such way is not portable and
introduces many
>>>> issues, for instance:
>>>> 1) The frontend (DragonEgg - in our case) must be taught to
emit real
>>>> math functions calls instead those of LLVM intrinsics, NVPTX
cannot handle
>>>> 2) However, not all intrinsics should be replaced by math calls
>>>> directly, for example, there is not cdexp call, but it could be
modelled
>>>> with sincos.
>>>> 3) Our math module assumes sm_20, and could be inefficient or
>>>> non-portable on other families of GPUs.
>>>>
>>>> Instead of this approach, I think math library should be
implemented *as
>>>> a lowering pass in backend*, working directly with intrinsics.
In this
>>>> case - naming is not important, as well as final optimization
is the job of
>>>> backend. But there is another important thing: backend should
codegen math
>>>> with respect to accuracy settings, specified either as backend
options, or
>>>> as functions attributes (quiet recent addition of LLVM).
Accuracy settings
>>>> should be:
>>>> 1) fast-math (ftz, prec-div, prec-sqrt, fma, etc.)
>>>> 2) Use or not GPU-specific low-precision functions (__sin,
__cos, etc.)
>>>>
>>>> Following latter approach, math handling of NVPTX will conform
the rest
>>>> of LLVM, and no host-dependant tweaks will be needed.
>>>>
>>>> I'm also interested to contribute into this developments at
reasonable
>>>> depth. Moving this part only on our own would slow down the
progess with
>>>> main targets too much, that's why I'm asking for your
help and cooperation.
>>>>
>>>> Best regards,
>>>> - Dima.
>>>>
>>>> [1]
>>>>
https://hpcforge.org/scm/viewvc.php/*checkout*/trunk/src/cuda/include/math.bc?root=kernelgen
>>>>
>>>> 2013/2/8 Yuan Lin <yulin at nvidia.com>
>>>>
>>>>> Yes, it helps a lot and we are working on it.****
>>>>>
>>>>> ** **
>>>>>
>>>>> A few questions,****
>>>>>
>>>>> **1)      **What will be your use model of this library?
Will you run
>>>>> optimization phases after linking with the library? If so,
what are they?
>>>>> ****
>>>>>
>>>>> **2)      **Do you care if the names of functions differ
from those
>>>>> in libm? For example, it would be gpusin() instead of
sin(). ****
>>>>>
>>>>> **3)      **Do you need a different library for different
host
>>>>> platforms? Why?****
>>>>>
>>>>> **4)      **Any other functions (besides math) you want to
see in
>>>>> this library?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Yuan****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> *From:* Dmitry Mikushin [mailto:dmitry at kernelgen.org]
>>>>> *Sent:* Thursday, February 07, 2013 2:09 PM
>>>>> *To:* Justin Holewinski; LLVM Developers Mailing List
>>>>> *Cc:* Yuan Lin
>>>>> *Subject:* [NVPTX] We need an LLVM CUDA math library, after
all****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi Justin, gentlemen,
>>>>>
>>>>> I'm afraid I have to escalate this issue at this point.
Since it was
>>>>> discussed for the first time last summer, it was sufficient
for us for a
>>>>> while to have lowering of math calls into intrinsics
disabled at DragonEgg
>>>>> level, and link them against CUDA math functions at LLVM IR
level. Now I
>>>>> can say: this is not sufficient any longer, and we need
NVPTX backend to
>>>>> deal with GPU math.
>>>>>
>>>>> > There also is no standard libm for PTX.
>>>>>
>>>>> Yes, that's right, but there is an interesting idea to
codegen CUDA
>>>>> math headers into LLVM IR and link it with user module at
IR level. This
>>>>> method gives a perfect degree of flexibility with respect
to high-level
>>>>> languages: the user no longer needs to deal with headers
and can have math
>>>>> right in the IR, regardless the language it was lowered
from. I can confirm
>>>>> this method works for us very well with C and Fortran, but
in order to make
>>>>> accurate replacements of unsupported intrinsics calls, it
needs to become
>>>>> aware of NVPTX backend capabilities in the form of:
>>>>>
>>>>> bool NVPTXTargetMachine::****
>>>>>
>>>>> isIntrinsicSupported(Function& intrinsic) and
>>>>> string
NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
>>>>> intrinsic)
>>>>>
>>>>> > I would prefer not to lower such things in the
back-end since
>>>>> different compilers may want to implement such functions
differently based
>>>>> on speed vs. accuracy trade-offs.
>>>>>
>>>>> Who are those different compilers? We are LLVM, the
complete compiler
>>>>> stack, which should handle these things on its specific
preference. Derived
>>>>> compilers may certainly think different, and it's their
own business to
>>>>> change anything they want and never contribute back. We
should not forget
>>>>> there are a lot of derived projects that use LLVM directly,
like KernelGen
>>>>> or many of those embedded DSLs recently started
flourishing. Their
>>>>> completeness and future relies on LLVM. For these reasons,
I would strongly
>>>>> prefer LLVM/NVPTX should supply a reference GPU math
implementation and
>>>>> invite you and everyone else to form a joint roadmap to
deliver it.
>>>>>
>>>>> Before we started, IANAL, but something tells me there
could be a
>>>>> licensing issue about releasing the LLVM IR emitted from
CUDA headers.
>>>>> Could you please check this with NVIDIA?
>>>>>
>>>>> Many thanks,
>>>>> - D.
>>>>>
>>>>> 2012/9/6 Justin Holewinski <justin.holewinski at
gmail.com>:
>>>>> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>>>> >>
>>>>> >> Dear all,
>>>>> >>
>>>>> >> During app compilation we have a crash in NVPTX
backend:
>>>>> >>
>>>>> >> LLVM ERROR: Cannot select: 0x732b270: i64
>>>>> ExternalSymbol'__powisf2'
>>>>> >> [ID=18]
>>>>> >>
>>>>> >> As I understand LLVM tries to lower the following
call
>>>>> >>
>>>>> >> %28 = call ptx_device float @llvm.powi.f32(float
2.000000e+00, i32
>>>>> %8)
>>>>> >> nounwind readonly
>>>>> >>
>>>>> >> to device intrinsic. The table
llvm/IntrinsicsNVVM.td does not
>>>>> contain
>>>>> >> such intrinsic, however it should be builtin,
according to
>>>>> >> cuda/include/math_functions.h
>>>>> >
>>>>> >
>>>>> > It actually gets lowered into an external function
call.
>>>>> >
>>>>> >
>>>>> >>
>>>>> >> Is my understanding correct, and we need simply
add the
>>>>> corresponding
>>>>> >> definition to llvm/IntrinsicsNVVM.td ? How to do
that, what are the
>>>>> >> rules?
>>>>> >
>>>>> >
>>>>> > PTX does not have an instruction (or simple series of
instructions)
>>>>> that
>>>>> > implements pow, so this will not be handled.  I would
prefer not to
>>>>> lower
>>>>> > such things in the back-end since different compilers
may want to
>>>>> implement
>>>>> > such functions differently based on speed vs. accuracy
trade-offs.
>>>>> >
>>>>> > There also is no standard libm for PTX.  It is up to
the higher-level
>>>>> > compiler to link against a run-time library that
provides functions
>>>>> like pow
>>>>> > (see include/math_functions.h in a CUDA distribution).
>>>>> >
>>>>> >>
>>>>> >> Thanks,
>>>>> >> - D.
>>>>> >> _______________________________________________
>>>>> >> LLVM Developers mailing list
>>>>> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>> >****
>>>>>
>>>>> ****
>>>>>
>>>>> >
>>>>> > --
>>>>> > Thanks,
>>>>> >
>>>>> > Justin Holewinski
>>>>> >****
>>>>>  ------------------------------
>>>>>  This email message is for the sole use of the intended
recipient(s)
>>>>> and may contain confidential information.  Any unauthorized
review, use,
>>>>> disclosure or distribution is prohibited.  If you are not
the intended
>>>>> recipient, please contact the sender by reply email and
destroy all copies
>>>>> of the original message.
>>>>>  ------------------------------
>>>>>
>>>>>
>>>>
>>
>
>
> --
>
> Thanks,
>
> Justin Holewinski
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/c623ac89/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Feb 2013 - [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Apparently Analagous Threads