similar to: [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all"

2013 Feb 09
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
The lack of an open-source vector math library (which is what you suggest here) prompted me to start a project "vecmathlib", available at < https://bitbucket.org/eschnett/vecmathlib>. This library provides almost all math functions available in libm, implemented in a vectorised manner, i.e. suitable for SSE2/AVX/MIC/PTX etc. In its current state the library has rough edges, e.g.
2013 Feb 17
2
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Dear Yuan, Sorry for delay with reply, Answers on your questions could be different, depending on the math library placement in the code generation pipeline. At KernelGen, we currently have a user-level CUDA math module, adopted from cicc internals [1]. It is intended to be linked with the user LLVM IR module, right before proceeding with the final optimization and backend. Last few months we
2013 Feb 17
2
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Hi Justin, I don't understand, why, for instance, X86 backend handles pow automatically, and NVPTX should be a PITA requiring user to bring his own pow implementation. Even at a very general level, this limits the interest of users to LLVM NVPTX backend. Could you please elaborate on the rationale behind your point? Why the accuracy modes I suggested are not sufficient, in your opinion? - D.
2013 Feb 17
2
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
> The issue is really that there is no standard math library for PTX. Well, formally, that could very well be true. Moreover, in some parts CPU math standard is impossible to accomplish on parallel architectures, consider, for example errno behavior. But here we are speaking more about practical side. And the practical side is: past 5 years CUDA claims to accelerate compute applications, and
2013 Feb 08
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Yes, it helps a lot and we are working on it. A few questions, 1) What will be your use model of this library? Will you run optimization phases after linking with the library? If so, what are they? 2) Do you care if the names of functions differ from those in libm? For example, it would be gpusin() instead of sin(). 3) Do you need a different library for different host
2013 Feb 17
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
I would be very hesitant to expose all math library functions as intrinsics. I believe linking with a target-specific math library is the correct approach, as it decouples the back end from the needs of the source program/language. Users should be free to use any math library implementation they choose. Intrinsics are meant for functions that compile down to specific isa features, like fused
2013 Feb 17
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
The X86 back-end just calls into libm: // Always use a library call for pow. setOperationAction(ISD::FPOW , MVT::f32 , Expand); setOperationAction(ISD::FPOW , MVT::f64 , Expand); setOperationAction(ISD::FPOW , MVT::f80 , Expand); The issue is really that there is no standard math library for PTX. I agree that this is a pain for most users, but I
2013 Jun 05
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Dear all, FWIW, I've tested libdevice.compute_20.10.bc and libdevice.compute_30.10.bc from /cuda/nvvm/libdevice shipped with CUDA 5.5 preview. IR is compatible with LLVM 3.4 trunk that we use. Results are correct, performance - almost the same as what we had before with cicc-sniffed IR, or maybe <10% better. Will test libdevice.compute_35.10.bc once we will get K20 support. Thanks for
2013 Jun 05
2
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Thanks for the info! I would be glad to hear of any issues you have encountered on this path. I tried to make sure the 3.3 release was fully compatible with the libdevice implementation shipping with 5.5 (and as far as I know, it is). It's just not an officially supported configuration. Also, I've been meaning to address your -drvcuda issue. How would you feel about making that a part
2012 Sep 06
1
[LLVMdev] [NVPTX] powf intrinsic in unimplemented
Dear all, During app compilation we have a crash in NVPTX backend: LLVM ERROR: Cannot select: 0x732b270: i64 = ExternalSymbol'__powisf2' [ID=18] As I understand LLVM tries to lower the following call %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8) nounwind readonly to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain such intrinsic, however it
2016 Jul 01
2
Missing TargetPrefix for NVVM intrinsics
Justins: I noticed that the intrinsics in IntrinsicsNVVM don't specify a TargetPrefix. This seems like a simple omission, so I was going to simply throw a `let TargetPrefix = "nvvm" ` block around them, but this doesn't quite work. There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to me
2013 Apr 01
2
[LLVMdev] [NVPTX] launch_bounds support?
Dear all, Is anybody working on CUDA launch bounds support? On PTX level, __attribute__((launch_bounds(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP))) should be emitted into .maxntid / .minnctapersm specification. Thanks, - D. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130401/044f2a01/attachment.html>
2016 Oct 14
2
LLVM/CLANG: CUDA compilation fail for inline assembly code
Hi, I am sorry for sending this query again here, but maybe I sent it to wrong list yesterday. I am trying to compile LonestarGPU-rev2.0 <http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu/download> benchmark suite with LLVM/CLANG. This suite has a following piece of code (more info here
2013 Mar 01
4
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
I'm building this with llvm-c, and accessing these intrinsics via calling the intrinsic as if it were a function. class F_SREG<string OpStr, NVPTXRegClass regclassOut, Intrinsic IntOp> : NVPTXInst<(outs regclassOut:$dst), (ins), OpStr, [(set regclassOut:$dst, (IntOp))]>; def INT_PTX_SREG_TID_X : F_SREG<"mov.u32 \t$dst, %tid.x;",
2012 May 01
2
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
> -----Original Message----- > From: Dan Bailey [mailto:dan at dneg.com] > Sent: Sunday, April 29, 2012 8:46 AM > To: Justin Holewinski > Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover; > llvmdev at cs.uiuc.edu > Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend > > Justin, > > Firstly, this is great! It seems to be so much further forward in
2013 Mar 01
1
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
The identifier INT_PTX_SREG_TID_X is the name of an instruction as the back-end sees it, and has very little to do with the name you should use in your IR. Your best bet is to look at the include/llvm/IR/IntrinsicsNVVM.td file and see the definitions for each intrinsic. Then, the name mapping is just: int_foo_bar -> llvm.foo.bar() int_ prefix becomes llvm., and all underscores turn into
2013 Apr 02
0
[LLVMdev] [NVPTX] launch_bounds support?
Yes, this is supported through metadata. An example usage of these annotations is given in the test/CodeGen/NVPTX/annotations.ll unit test. I'll try to remember to add this to the NVPTX documentation I'm putting together at http://llvm.org/docs/NVPTXUsage.html. On Mon, Apr 1, 2013 at 8:06 AM, Dmitry Mikushin <dmitry at kernelgen.org>wrote: > Dear all, > > Is anybody
2012 May 02
0
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Justin Holewinski wrote: <blockquote
2013 Mar 01
0
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
Hi Timothy, I'm not sure what you mean by this working for other intrinsics, but in this case, I think you want the intrinsic name llvm.nvvm.read.ptx.sreg.tid.x. For me, this looks like: %x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() Pete On Fri, Mar 1, 2013 at 11:51 AM, Timothy Baldridge <tbaldridge at gmail.com> wrote: > I'm building this with llvm-c, and accessing these
2012 Apr 27
2
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
Thanks for the feedback! The attached patch addresses the style issues that have been found. From: Jim Grosbach [mailto:grosbach at apple.com] Sent: Wednesday, April 25, 2012 2:22 PM To: Justin Holewinski Cc: llvm-commits at cs.uiuc.edu; llvmdev at cs.uiuc.edu; Vinod Grover Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend Hi Justin, Cool stuff, to be sure. Excited to see this. As a