thr3ads.net - similar to: "[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all"

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 09

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

The lack of an open-source vector math library (which is what you suggest here) prompted me to start a project "vecmathlib", available at < https://bitbucket.org/eschnett/vecmathlib>. This library provides almost all math functions available in libm, implemented in a vectorised manner, i.e. suitable for SSE2/AVX/MIC/PTX etc. In its current state the library has rough edges, e.g.

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 17

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Dear Yuan, Sorry for delay with reply, Answers on your questions could be different, depending on the math library placement in the code generation pipeline. At KernelGen, we currently have a user-level CUDA math module, adopted from cicc internals [1]. It is intended to be linked with the user LLVM IR module, right before proceeding with the final optimization and backend. Last few months we

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 17

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Hi Justin, I don't understand, why, for instance, X86 backend handles pow automatically, and NVPTX should be a PITA requiring user to bring his own pow implementation. Even at a very general level, this limits the interest of users to LLVM NVPTX backend. Could you please elaborate on the rationale behind your point? Why the accuracy modes I suggested are not sufficient, in your opinion? - D.

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 17

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

> The issue is really that there is no standard math library for PTX. Well, formally, that could very well be true. Moreover, in some parts CPU math standard is impossible to accomplish on parallel architectures, consider, for example errno behavior. But here we are speaking more about practical side. And the practical side is: past 5 years CUDA claims to accelerate compute applications, and

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 08

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Yes, it helps a lot and we are working on it. A few questions, 1) What will be your use model of this library? Will you run optimization phases after linking with the library? If so, what are they? 2) Do you care if the names of functions differ from those in libm? For example, it would be gpusin() instead of sin(). 3) Do you need a different library for different host

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 17

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

I would be very hesitant to expose all math library functions as intrinsics. I believe linking with a target-specific math library is the correct approach, as it decouples the back end from the needs of the source program/language. Users should be free to use any math library implementation they choose. Intrinsics are meant for functions that compile down to specific isa features, like fused

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Feb 17

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

The X86 back-end just calls into libm: // Always use a library call for pow. setOperationAction(ISD::FPOW , MVT::f32 , Expand); setOperationAction(ISD::FPOW , MVT::f64 , Expand); setOperationAction(ISD::FPOW , MVT::f80 , Expand); The issue is really that there is no standard math library for PTX. I agree that this is a pain for most users, but I

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Jun 05

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Dear all, FWIW, I've tested libdevice.compute_20.10.bc and libdevice.compute_30.10.bc from /cuda/nvvm/libdevice shipped with CUDA 5.5 preview. IR is compatible with LLVM 3.4 trunk that we use. Results are correct, performance - almost the same as what we had before with cicc-sniffed IR, or maybe <10% better. Will test libdevice.compute_35.10.bc once we will get K20 support. Thanks for

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

2013 Jun 05

[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Thanks for the info! I would be glad to hear of any issues you have encountered on this path. I tried to make sure the 3.3 release was fully compatible with the libdevice implementation shipping with 5.5 (and as far as I know, it is). It's just not an officially supported configuration. Also, I've been meaning to address your -drvcuda issue. How would you feel about making that a part

[LLVMdev] [NVPTX] powf intrinsic in unimplemented

2012 Sep 06

[LLVMdev] [NVPTX] powf intrinsic in unimplemented

Dear all, During app compilation we have a crash in NVPTX backend: LLVM ERROR: Cannot select: 0x732b270: i64 = ExternalSymbol'__powisf2' [ID=18] As I understand LLVM tries to lower the following call %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8) nounwind readonly to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain such intrinsic, however it

Missing TargetPrefix for NVVM intrinsics

2016 Jul 01

Missing TargetPrefix for NVVM intrinsics

Justins: I noticed that the intrinsics in IntrinsicsNVVM don't specify a TargetPrefix. This seems like a simple omission, so I was going to simply throw a `let TargetPrefix = "nvvm" ` block around them, but this doesn't quite work. There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to me

[LLVMdev] [NVPTX] launch_bounds support?

2013 Apr 01

[LLVMdev] [NVPTX] launch_bounds support?

Dear all, Is anybody working on CUDA launch bounds support? On PTX level, __attribute__((launch_bounds(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP))) should be emitted into .maxntid / .minnctapersm specification. Thanks, - D. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130401/044f2a01/attachment.html>

LLVM/CLANG: CUDA compilation fail for inline assembly code

2016 Oct 14

LLVM/CLANG: CUDA compilation fail for inline assembly code

Hi, I am sorry for sending this query again here, but maybe I sent it to wrong list yesterday. I am trying to compile LonestarGPU-rev2.0 <http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu/download> benchmark suite with LLVM/CLANG. This suite has a following piece of code (more info here

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

I'm building this with llvm-c, and accessing these intrinsics via calling the intrinsic as if it were a function. class F_SREG<string OpStr, NVPTXRegClass regclassOut, Intrinsic IntOp> : NVPTXInst<(outs regclassOut:$dst), (ins), OpStr, [(set regclassOut:$dst, (IntOp))]>; def INT_PTX_SREG_TID_X : F_SREG<"mov.u32 \t$dst, %tid.x;",

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 01

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

> -----Original Message----- > From: Dan Bailey [mailto:dan at dneg.com] > Sent: Sunday, April 29, 2012 8:46 AM > To: Justin Holewinski > Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover; > llvmdev at cs.uiuc.edu > Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend > > Justin, > > Firstly, this is great! It seems to be so much further forward in

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

The identifier INT_PTX_SREG_TID_X is the name of an instruction as the back-end sees it, and has very little to do with the name you should use in your IR. Your best bet is to look at the include/llvm/IR/IntrinsicsNVVM.td file and see the definitions for each intrinsic. Then, the name mapping is just: int_foo_bar -> llvm.foo.bar() int_ prefix becomes llvm., and all underscores turn into

[LLVMdev] [NVPTX] launch_bounds support?

2013 Apr 02

[LLVMdev] [NVPTX] launch_bounds support?

Yes, this is supported through metadata. An example usage of these annotations is given in the test/CodeGen/NVPTX/annotations.ll unit test. I'll try to remember to add this to the NVPTX documentation I'm putting together at http://llvm.org/docs/NVPTXUsage.html. On Mon, Apr 1, 2013 at 8:06 AM, Dmitry Mikushin <dmitry at kernelgen.org>wrote: > Dear all, > > Is anybody

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 02

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Justin Holewinski wrote: <blockquote

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

Hi Timothy, I'm not sure what you mean by this working for other intrinsics, but in this case, I think you want the intrinsic name llvm.nvvm.read.ptx.sreg.tid.x. For me, this looks like: %x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() Pete On Fri, Mar 1, 2013 at 11:51 AM, Timothy Baldridge <tbaldridge at gmail.com> wrote: > I'm building this with llvm-c, and accessing these

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 Apr 27

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

Thanks for the feedback! The attached patch addresses the style issues that have been found. From: Jim Grosbach [mailto:grosbach at apple.com] Sent: Wednesday, April 25, 2012 2:22 PM To: Justin Holewinski Cc: llvm-commits at cs.uiuc.edu; llvmdev at cs.uiuc.edu; Vinod Grover Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend Hi Justin, Cool stuff, to be sure. Excited to see this. As a

similar to: [LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all