thr3ads.net - llvm dev - [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Dmitry Mikushin

2013-Feb-07 22:08 UTC

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Hi Justin, gentlemen,

I'm afraid I have to escalate this issue at this point. Since it was
discussed for the first time last summer, it was sufficient for us for a
while to have lowering of math calls into intrinsics disabled at DragonEgg
level, and link them against CUDA math functions at LLVM IR level. Now I
can say: this is not sufficient any longer, and we need NVPTX backend to
deal with GPU math.
> There also is no standard libm for PTX.
Yes, that's right, but there is an interesting idea to codegen CUDA math
headers into LLVM IR and link it with user module at IR level. This method
gives a perfect degree of flexibility with respect to high-level languages:
the user no longer needs to deal with headers and can have math right in
the IR, regardless the language it was lowered from. I can confirm this
method works for us very well with C and Fortran, but in order to make
accurate replacements of unsupported intrinsics calls, it needs to become
aware of NVPTX backend capabilities in the form of:

bool NVPTXTargetMachine::
isIntrinsicSupported(Function& intrinsic) and
string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
intrinsic)
> I would prefer not to lower such things in the back-end since differentcompilers may want to implement such functions differently based on speed
vs. accuracy trade-offs.

Who are those different compilers? We are LLVM, the complete compiler
stack, which should handle these things on its specific preference. Derived
compilers may certainly think different, and it's their own business to
change anything they want and never contribute back. We should not forget
there are a lot of derived projects that use LLVM directly, like KernelGen
or many of those embedded DSLs recently started flourishing. Their
completeness and future relies on LLVM. For these reasons, I would strongly
prefer LLVM/NVPTX should supply a reference GPU math implementation and
invite you and everyone else to form a joint roadmap to deliver it.

Before we started, IANAL, but something tells me there could be a licensing
issue about releasing the LLVM IR emitted from CUDA headers.
Could you please check this with NVIDIA?

Many thanks,
- D.

2012/9/6 Justin Holewinski <justin.holewinski at
gmail.com>:> On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>
>> Dear all,
>>
>> During app compilation we have a crash in NVPTX backend:
>>
>> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
>> [ID=18]
>>
>> As I understand LLVM tries to lower the following call
>>
>> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8)
>> nounwind readonly
>>
>> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain
>> such intrinsic, however it should be builtin, according to
>> cuda/include/math_functions.h
>
>
> It actually gets lowered into an external function call.
>
>
>>
>> Is my understanding correct, and we need simply add the corresponding
>> definition to llvm/IntrinsicsNVVM.td ? How to do that, what are the
>> rules?
>
>
> PTX does not have an instruction (or simple series of instructions) that
> implements pow, so this will not be handled.  I would prefer not to lower
> such things in the back-end since different compilers may want to
implement> such functions differently based on speed vs. accuracy trade-offs.
>
> There also is no standard libm for PTX.  It is up to the higher-level
> compiler to link against a run-time library that provides functions like
pow> (see include/math_functions.h in a CUDA distribution).
>
>>
>> Thanks,
>> - D.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> --
> Thanks,
>
> Justin Holewinski
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130207/10a217d1/attachment.html>

Yuan Lin

2013-Feb-08 00:38 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Yes, it helps a lot and we are working on it.

A few questions,

1)      What will be your use model of this library? Will you run optimization
phases after linking with the library? If so, what are they?

2)      Do you care if the names of functions differ from those in libm? For
example, it would be gpusin() instead of sin().

3)      Do you need a different library for different host platforms? Why?

4)      Any other functions (besides math) you want to see in this library?

Thanks.

Yuan


From: Dmitry Mikushin [mailto:dmitry at kernelgen.org]
Sent: Thursday, February 07, 2013 2:09 PM
To: Justin Holewinski; LLVM Developers Mailing List
Cc: Yuan Lin
Subject: [NVPTX] We need an LLVM CUDA math library, after all

Hi Justin, gentlemen,

I'm afraid I have to escalate this issue at this point. Since it was
discussed for the first time last summer, it was sufficient for us for a while
to have lowering of math calls into intrinsics disabled at DragonEgg level, and
link them against CUDA math functions at LLVM IR level. Now I can say: this is
not sufficient any longer, and we need NVPTX backend to deal with GPU math.
> There also is no standard libm for PTX.
Yes, that's right, but there is an interesting idea to codegen CUDA math
headers into LLVM IR and link it with user module at IR level. This method gives
a perfect degree of flexibility with respect to high-level languages: the user
no longer needs to deal with headers and can have math right in the IR,
regardless the language it was lowered from. I can confirm this method works for
us very well with C and Fortran, but in order to make accurate replacements of
unsupported intrinsics calls, it needs to become aware of NVPTX backend
capabilities in the form of:

bool NVPTXTargetMachine::
isIntrinsicSupported(Function& intrinsic) and
string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
intrinsic)
> I would prefer not to lower such things in the back-end since different
compilers may want to implement such functions differently based on speed vs.
accuracy trade-offs.
Who are those different compilers? We are LLVM, the complete compiler stack,
which should handle these things on its specific preference. Derived compilers
may certainly think different, and it's their own business to change
anything they want and never contribute back. We should not forget there are a
lot of derived projects that use LLVM directly, like KernelGen or many of those
embedded DSLs recently started flourishing. Their completeness and future relies
on LLVM. For these reasons, I would strongly prefer LLVM/NVPTX should supply a
reference GPU math implementation and invite you and everyone else to form a
joint roadmap to deliver it.

Before we started, IANAL, but something tells me there could be a licensing
issue about releasing the LLVM IR emitted from CUDA headers.
Could you please check this with NVIDIA?

Many thanks,
- D.

2012/9/6 Justin Holewinski <justin.holewinski at
gmail.com<mailto:justin.holewinski at
gmail.com>>:> On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>
>> Dear all,
>>
>> During app compilation we have a crash in NVPTX backend:
>>
>> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
>> [ID=18]
>>
>> As I understand LLVM tries to lower the following call
>>
>> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8)
>> nounwind readonly
>>
>> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain
>> such intrinsic, however it should be builtin, according to
>> cuda/include/math_functions.h
>
>
> It actually gets lowered into an external function call.
>
>
>>
>> Is my understanding correct, and we need simply add the corresponding
>> definition to llvm/IntrinsicsNVVM.td ? How to do that, what are the
>> rules?
>
>
> PTX does not have an instruction (or simple series of instructions) that
> implements pow, so this will not be handled.  I would prefer not to lower
> such things in the back-end since different compilers may want to implement
> such functions differently based on speed vs. accuracy trade-offs.
>
> There also is no standard libm for PTX.  It is up to the higher-level
> compiler to link against a run-time library that provides functions like
pow
> (see include/math_functions.h in a CUDA distribution).
>
>>
>> Thanks,
>> - D.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
[https://mail.google.com/mail/u/1/images/cleardot.gif]>
> --
> Thanks,
>
> Justin Holewinski
>
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact the sender
by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130207/01419ac9/attachment.html>

Erik Schnetter

2013-Feb-09 21:20 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

The lack of an open-source vector math library (which is what you suggest
here) prompted me to start a project "vecmathlib", available at <
https://bitbucket.org/eschnett/vecmathlib>. This library provides almost
all math functions available in libm, implemented in a vectorised manner,
i.e. suitable for SSE2/AVX/MIC/PTX etc.

In its current state the library has rough edges, e.g. the precision of
many math functions is not yet ideal, and exceptional cases (nan, inf) are
probably not yet all handled correctly. I would be happy if vecmathlib
could be used in LLVM.

For example, assuming that there is a data type "double4" containing a
vector of 4 double precision values, vecmathlib provides a function double4
pow(double4, double4) that implements pow(). In the general case, i.e. if
no system-specific machine instructions are available, this would use
Taylor expansions to calculate pow(x,y)=exp(y*log(x)).

I would be happy to receive feedback on and/or contributions to vecmathlib.

-erik


On Thu, Feb 7, 2013 at 5:08 PM, Dmitry Mikushin <dmitry at
kernelgen.org>wrote:
> Hi Justin, gentlemen,
>
> I'm afraid I have to escalate this issue at this point. Since it was
> discussed for the first time last summer, it was sufficient for us for a
> while to have lowering of math calls into intrinsics disabled at DragonEgg
> level, and link them against CUDA math functions at LLVM IR level. Now I
> can say: this is not sufficient any longer, and we need NVPTX backend to
> deal with GPU math.
>
> > There also is no standard libm for PTX.
>
> Yes, that's right, but there is an interesting idea to codegen CUDA
math
> headers into LLVM IR and link it with user module at IR level. This method
> gives a perfect degree of flexibility with respect to high-level languages:
> the user no longer needs to deal with headers and can have math right in
> the IR, regardless the language it was lowered from. I can confirm this
> method works for us very well with C and Fortran, but in order to make
> accurate replacements of unsupported intrinsics calls, it needs to become
> aware of NVPTX backend capabilities in the form of:
>
> bool NVPTXTargetMachine::
> isIntrinsicSupported(Function& intrinsic) and
> string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
> intrinsic)
>
> > I would prefer not to lower such things in the back-end since
different
> compilers may want to implement such functions differently based on speed
> vs. accuracy trade-offs.
>
> Who are those different compilers? We are LLVM, the complete compiler
> stack, which should handle these things on its specific preference. Derived
> compilers may certainly think different, and it's their own business to
> change anything they want and never contribute back. We should not forget
> there are a lot of derived projects that use LLVM directly, like KernelGen
> or many of those embedded DSLs recently started flourishing. Their
> completeness and future relies on LLVM. For these reasons, I would strongly
> prefer LLVM/NVPTX should supply a reference GPU math implementation and
> invite you and everyone else to form a joint roadmap to deliver it.
>
> Before we started, IANAL, but something tells me there could be a
> licensing issue about releasing the LLVM IR emitted from CUDA headers.
> Could you please check this with NVIDIA?
>
> Many thanks,
> - D.
>
> 2012/9/6 Justin Holewinski <justin.holewinski at gmail.com>:
> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
> >>
> >> Dear all,
> >>
> >> During app compilation we have a crash in NVPTX backend:
> >>
> >> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
> >> [ID=18]
> >>
> >> As I understand LLVM tries to lower the following call
> >>
> >> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32
%8)
> >> nounwind readonly
> >>
> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not
contain
> >> such intrinsic, however it should be builtin, according to
> >> cuda/include/math_functions.h
> >
> >
> > It actually gets lowered into an external function call.
> >
> >
> >>
> >> Is my understanding correct, and we need simply add the
corresponding
> >> definition to llvm/IntrinsicsNVVM.td ? How to do that, what are
the
> >> rules?
> >
> >
> > PTX does not have an instruction (or simple series of instructions)
that
> > implements pow, so this will not be handled.  I would prefer not to
lower
> > such things in the back-end since different compilers may want to
> implement
> > such functions differently based on speed vs. accuracy trade-offs.
> >
> > There also is no standard libm for PTX.  It is up to the higher-level
> > compiler to link against a run-time library that provides functions
like
> pow
> > (see include/math_functions.h in a CUDA distribution).
> >
> >>
> >> Thanks,
> >> - D.
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> > --
> > Thanks,
> >
> > Justin Holewinski
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130209/156ef16d/attachment.html>

Dmitry Mikushin

2013-Feb-17 01:46 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Dear Yuan,

Sorry for delay with reply,

Answers on your questions could be different, depending on the math library
placement in the code generation pipeline. At KernelGen, we currently have
a user-level CUDA math module, adopted from cicc internals [1]. It is
intended to be linked with the user LLVM IR module, right before proceeding
with the final optimization and backend. Last few months we are using this
method to temporary workaround the absence of many math functions, to keep
up the speed of applications testing in our compiler test suite. Supplying
math in such way is not portable and introduces many issues, for instance:
1) The frontend (DragonEgg - in our case) must be taught to emit real math
functions calls instead those of LLVM intrinsics, NVPTX cannot handle
2) However, not all intrinsics should be replaced by math calls directly,
for example, there is not cdexp call, but it could be modelled with sincos.
3) Our math module assumes sm_20, and could be inefficient or non-portable
on other families of GPUs.

Instead of this approach, I think math library should be implemented *as a
lowering pass in backend*, working directly with intrinsics. In this case -
naming is not important, as well as final optimization is the job of
backend. But there is another important thing: backend should codegen math
with respect to accuracy settings, specified either as backend options, or
as functions attributes (quiet recent addition of LLVM). Accuracy settings
should be:
1) fast-math (ftz, prec-div, prec-sqrt, fma, etc.)
2) Use or not GPU-specific low-precision functions (__sin, __cos, etc.)

Following latter approach, math handling of NVPTX will conform the rest of
LLVM, and no host-dependant tweaks will be needed.

I'm also interested to contribute into this developments at reasonable
depth. Moving this part only on our own would slow down the progess with
main targets too much, that's why I'm asking for your help and
cooperation.

Best regards,
- Dima.

[1]
https://hpcforge.org/scm/viewvc.php/*checkout*/trunk/src/cuda/include/math.bc?root=kernelgen

2013/2/8 Yuan Lin <yulin at nvidia.com>
> Yes, it helps a lot and we are working on it.****
>
> ** **
>
> A few questions,****
>
> **1)      **What will be your use model of this library? Will you run
> optimization phases after linking with the library? If so, what are they?*
> ***
>
> **2)      **Do you care if the names of functions differ from those in
> libm? For example, it would be gpusin() instead of sin(). ****
>
> **3)      **Do you need a different library for different host platforms?
> Why?****
>
> **4)      **Any other functions (besides math) you want to see in this
> library?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Yuan****
>
> ** **
>
> ** **
>
> *From:* Dmitry Mikushin [mailto:dmitry at kernelgen.org]
> *Sent:* Thursday, February 07, 2013 2:09 PM
> *To:* Justin Holewinski; LLVM Developers Mailing List
> *Cc:* Yuan Lin
> *Subject:* [NVPTX] We need an LLVM CUDA math library, after all****
>
> ** **
>
> Hi Justin, gentlemen,
>
> I'm afraid I have to escalate this issue at this point. Since it was
> discussed for the first time last summer, it was sufficient for us for a
> while to have lowering of math calls into intrinsics disabled at DragonEgg
> level, and link them against CUDA math functions at LLVM IR level. Now I
> can say: this is not sufficient any longer, and we need NVPTX backend to
> deal with GPU math.
>
> > There also is no standard libm for PTX.
>
> Yes, that's right, but there is an interesting idea to codegen CUDA
math
> headers into LLVM IR and link it with user module at IR level. This method
> gives a perfect degree of flexibility with respect to high-level languages:
> the user no longer needs to deal with headers and can have math right in
> the IR, regardless the language it was lowered from. I can confirm this
> method works for us very well with C and Fortran, but in order to make
> accurate replacements of unsupported intrinsics calls, it needs to become
> aware of NVPTX backend capabilities in the form of:
>
> bool NVPTXTargetMachine::****
>
> isIntrinsicSupported(Function& intrinsic) and
> string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
> intrinsic)
>
> > I would prefer not to lower such things in the back-end since
different
> compilers may want to implement such functions differently based on speed
> vs. accuracy trade-offs.
>
> Who are those different compilers? We are LLVM, the complete compiler
> stack, which should handle these things on its specific preference. Derived
> compilers may certainly think different, and it's their own business to
> change anything they want and never contribute back. We should not forget
> there are a lot of derived projects that use LLVM directly, like KernelGen
> or many of those embedded DSLs recently started flourishing. Their
> completeness and future relies on LLVM. For these reasons, I would strongly
> prefer LLVM/NVPTX should supply a reference GPU math implementation and
> invite you and everyone else to form a joint roadmap to deliver it.
>
> Before we started, IANAL, but something tells me there could be a
> licensing issue about releasing the LLVM IR emitted from CUDA headers.
> Could you please check this with NVIDIA?
>
> Many thanks,
> - D.
>
> 2012/9/6 Justin Holewinski <justin.holewinski at gmail.com>:
> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
> >>
> >> Dear all,
> >>
> >> During app compilation we have a crash in NVPTX backend:
> >>
> >> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
> >> [ID=18]
> >>
> >> As I understand LLVM tries to lower the following call
> >>
> >> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32
%8)
> >> nounwind readonly
> >>
> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not
contain
> >> such intrinsic, however it should be builtin, according to
> >> cuda/include/math_functions.h
> >
> >
> > It actually gets lowered into an external function call.
> >
> >
> >>
> >> Is my understanding correct, and we need simply add the
corresponding
> >> definition to llvm/IntrinsicsNVVM.td ? How to do that, what are
the
> >> rules?
> >
> >
> > PTX does not have an instruction (or simple series of instructions)
that
> > implements pow, so this will not be handled.  I would prefer not to
lower
> > such things in the back-end since different compilers may want to
> implement
> > such functions differently based on speed vs. accuracy trade-offs.
> >
> > There also is no standard libm for PTX.  It is up to the higher-level
> > compiler to link against a run-time library that provides functions
like
> pow
> > (see include/math_functions.h in a CUDA distribution).
> >
> >>
> >> Thanks,
> >> - D.
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >****
>
> ****
>
> >
> > --
> > Thanks,
> >
> > Justin Holewinski
> >****
>  ------------------------------
>  This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  ------------------------------
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/2aa3802f/attachment.html>

Dmitry Mikushin

2013-Feb-17 01:52 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013/2/8 Yuan Lin <yulin at nvidia.com>
> 4)      Any other functions (besides math) you want to see in this
> library?
>
- Atomics.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/9d969cbf/attachment.html>

Dmitry Mikushin

2013-Feb-17 02:36 UTC

head link

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Hi Eric,

1) PTX:
AFAIK, although there is support for vector math in Fermi GPU ISA (not to
be confused with abstract PTX ISA), there seems to be no vector forms for
math at PTX level. However, there are vector forms for data/register
movement, and they are beneficial in some circumstances. But this is a
different topic. So, speaking of NVIDIA GPU math, we currently presume
scalar-only math.

2) MIC:
AFAIK, LLVM does not have an established path for MIC, I've seen
description of only one patch [1], but its status is unknown, and it seems
it was never commited.

Have you already performed any evaluation on MIC and PTX?

Thanks,
- D.

[1]
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121029/154678.html

2013/2/9 Erik Schnetter <schnetter at cct.lsu.edu>
> The lack of an open-source vector math library (which is what you suggest
> here) prompted me to start a project "vecmathlib", available at
<
> https://bitbucket.org/eschnett/vecmathlib>. This library provides almost
> all math functions available in libm, implemented in a vectorised manner,
> i.e. suitable for SSE2/AVX/MIC/PTX etc.
>
> In its current state the library has rough edges, e.g. the precision of
> many math functions is not yet ideal, and exceptional cases (nan, inf) are
> probably not yet all handled correctly. I would be happy if vecmathlib
> could be used in LLVM.
>
> For example, assuming that there is a data type "double4"
containing a
> vector of 4 double precision values, vecmathlib provides a function double4
> pow(double4, double4) that implements pow(). In the general case, i.e. if
> no system-specific machine instructions are available, this would use
> Taylor expansions to calculate pow(x,y)=exp(y*log(x)).
>
> I would be happy to receive feedback on and/or contributions to
> vecmathlib.
>
> -erik
>
>
> On Thu, Feb 7, 2013 at 5:08 PM, Dmitry Mikushin <dmitry at
kernelgen.org>wrote:
>
>> Hi Justin, gentlemen,
>>
>> I'm afraid I have to escalate this issue at this point. Since it
was
>> discussed for the first time last summer, it was sufficient for us for
a
>> while to have lowering of math calls into intrinsics disabled at
DragonEgg
>> level, and link them against CUDA math functions at LLVM IR level. Now
I
>> can say: this is not sufficient any longer, and we need NVPTX backend
to
>> deal with GPU math.
>>
>> > There also is no standard libm for PTX.
>>
>> Yes, that's right, but there is an interesting idea to codegen CUDA
math
>> headers into LLVM IR and link it with user module at IR level. This
method
>> gives a perfect degree of flexibility with respect to high-level
languages:
>> the user no longer needs to deal with headers and can have math right
in
>> the IR, regardless the language it was lowered from. I can confirm this
>> method works for us very well with C and Fortran, but in order to make
>> accurate replacements of unsupported intrinsics calls, it needs to
become
>> aware of NVPTX backend capabilities in the form of:
>>
>> bool NVPTXTargetMachine::
>> isIntrinsicSupported(Function& intrinsic) and
>> string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
>> intrinsic)
>>
>> > I would prefer not to lower such things in the back-end since
different
>> compilers may want to implement such functions differently based on
speed
>> vs. accuracy trade-offs.
>>
>> Who are those different compilers? We are LLVM, the complete compiler
>> stack, which should handle these things on its specific preference.
Derived
>> compilers may certainly think different, and it's their own
business to
>> change anything they want and never contribute back. We should not
forget
>> there are a lot of derived projects that use LLVM directly, like
KernelGen
>> or many of those embedded DSLs recently started flourishing. Their
>> completeness and future relies on LLVM. For these reasons, I would
strongly
>> prefer LLVM/NVPTX should supply a reference GPU math implementation and
>> invite you and everyone else to form a joint roadmap to deliver it.
>>
>> Before we started, IANAL, but something tells me there could be a
>> licensing issue about releasing the LLVM IR emitted from CUDA headers.
>> Could you please check this with NVIDIA?
>>
>> Many thanks,
>> - D.
>>
>> 2012/9/6 Justin Holewinski <justin.holewinski at gmail.com>:
>> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>> >>
>> >> Dear all,
>> >>
>> >> During app compilation we have a crash in NVPTX backend:
>> >>
>> >> LLVM ERROR: Cannot select: 0x732b270: i64 =
ExternalSymbol'__powisf2'
>> >> [ID=18]
>> >>
>> >> As I understand LLVM tries to lower the following call
>> >>
>> >> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00,
i32 %8)
>> >> nounwind readonly
>> >>
>> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not
contain
>> >> such intrinsic, however it should be builtin, according to
>> >> cuda/include/math_functions.h
>> >
>> >
>> > It actually gets lowered into an external function call.
>> >
>> >
>> >>
>> >> Is my understanding correct, and we need simply add the
corresponding
>> >> definition to llvm/IntrinsicsNVVM.td ? How to do that, what
are the
>> >> rules?
>> >
>> >
>> > PTX does not have an instruction (or simple series of
instructions) that
>> > implements pow, so this will not be handled.  I would prefer not
to
>> lower
>> > such things in the back-end since different compilers may want to
>> implement
>> > such functions differently based on speed vs. accuracy trade-offs.
>> >
>> > There also is no standard libm for PTX.  It is up to the
higher-level
>> > compiler to link against a run-time library that provides
functions
>> like pow
>> > (see include/math_functions.h in a CUDA distribution).
>> >
>> >>
>> >> Thanks,
>> >> - D.
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> > --
>> > Thanks,
>> >
>> > Justin Holewinski
>> >
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>
> --
> Erik Schnetter <schnetter at cct.lsu.edu>
> http://www.perimeterinstitute.ca/personal/eschnetter/
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/1a9c21f8/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Feb 2013 - [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Seemingly Similar Threads