thr3ads.net - similar to: "making cuda-based R package CRAN friendly"

Displaying 20 results from an estimated 6000 matches similar to: "making cuda-based R package CRAN friendly"

2020 Jul 30

Status of CUDA 11 support

Hi, I work in a large CUDA codebase and use Clang to build some of our CUDA code to improve compilation speed. We're planning to upgrade to CUDA 11 soon, and it appears that CUDA 11 is not yet supported in LLVM. >From the LLVM commits history, I can see that work on CUDA 11 has started. Is this currently being worked on? What is the remaining work left? And is any help needed to finish

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

+Artem Belevich <tra at google.com> On Fri, Mar 23, 2018 at 7:53 PM Bharath Bhoopalam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I was wondering if anyone has encountered this issue when cross compiling > cuda on Nvidia TX2 running android. > > The error is > In file included from <built-in>:1: > In file included from >

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

I was wondering if anyone has encountered this issue when cross compiling cuda on Nvidia TX2 running android. The error is In file included from <built-in>:1: In file included from prebuilts/clang/host/linux-x86/clang-4667116/lib64/clang/7.0.1/include/__clang_cuda_runtime_wrapper.h:219: ../cuda/targets/aarch64-linux-androideabi/include/math_functions.hpp:3477:19: error: no matching function

cuda-memcheck to debug CUDA-enabled R packages

2014 Jun 03

cuda-memcheck to debug CUDA-enabled R packages

I'm building a simple R extension around a CUDA-enabled dynamic library, and I want to run the whole package with cuda-memcheck for debugging purposes. I can run it just fine with Valgrind: $ R --no-save -d valgrind < test.R However, if I try the same thing with cuda-memcheck, $ R --no-save -d cuda-memcheck < test.R I get: *** Further command line arguments ('--no-save ')

Missing R.h

2011 Feb 25

Missing R.h

Hi, I'm trying to install a module - gputools - and keep getting compile time errors about missing R.h Does anyone know where this file can be found? Thanks!

CUDA tools?

2017 Oct 06

CUDA tools?

On Thu, 2017-10-05 at 17:07 -0400, m.roth at 5-cent.us wrote: > vychytraly . wrote: > > On Thu, Oct 5, 2017 at 9:51 PM, <m.roth at 5-cent.us> wrote: > > > > > > So, kmod-nvidia installed. Trouble is, I have no tool to test it. And my > > > user might need nvcc, which, of course, is only provided by the NVidia > > > CUDA, which won't install,

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated

JIT compiling CUDA source code

2020 Nov 19

JIT compiling CUDA source code

I have made a bit of progress... When compiling CUDA source code in memory, the Compilation instance returned by Driver::BuildCompilation() contains two clang Commands: one for the host and one for the CUDA device. I can execute both commands using EmitLLVMOnlyActions. I add the Module from the host compilation to my JIT as usual, but... what to do with the Module from the device compilation? If I

llvm/cuda: Indentify kernel functions and optimizations

2016 Dec 21

llvm/cuda: Indentify kernel functions and optimizations

Hi, I am trying to instrument CUDA kernel functions only (llvm-3.9.0). Is there a way to identify cuda kernel functions? I see that in llvm IR for CUDA has nvvm annotations section, where kernel functions are identified for NVPTX usage. I can parse the whole IR for this kernel metadata and then proceed, but this is very clumsy. Other way is to work with cuda-device-only IR. But then I am not

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!

2013 Nov 18

Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!

Hi everyone, after following in the footsteps of the following discussion (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html) I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 seems to function properly up to a point: lspci -v: 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL

llvm/cuda: Indentify kernel functions and optimizations

2016 Dec 21

llvm/cuda: Indentify kernel functions and optimizations

https://github.com/llvm-mirror/llvm/blob/652375a8cc49615de31fd9d424753795059185b6/lib/Target/NVPTX/NVPTXUtilities.h#L58 Does this solve your problem? On Wed, Dec 21, 2016 at 2:29 PM, Gurunath Kadam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > I am trying to instrument CUDA kernel functions only (llvm-3.9.0). > > Is there a way to identify cuda kernel

cuda __shfl_sync problem

2020 Sep 24

cuda __shfl_sync problem

Hi, First of all, i'm not sure if i should be posting this here or in cfe-dev, but here it goes. In order to instrument CUDA kernels i first generate device IR with: clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR

JIT compiling CUDA source code

2020 Nov 19

JIT compiling CUDA source code

Sound right now like you are emitting an LLVM module? The best strategy is probably to use to emit a PTX module and then pass that to the CUDA driver. This is what we do on the Julia side in CUDA.jl. Nvidia has a somewhat helpful tutorial on this at https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp and

cuda __shfl_sync problem

2020 Sep 25

cuda __shfl_sync problem

Do you mean in llc? Because i don't see such an option i'm afraid. ~George On 24-09-2020 20:54, Johannes Doerfert wrote: > Not that I am an expert but it looks like it defaults to the minimal > PTX version that supports the compute capability. You might be able to > choose PTX 6.0 though. > > ~ Johannes > > > On 9/24/20 1:02 PM, George K via llvm-dev wrote:

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

On Wed, Apr 8, 2015 at 10:12 AM, Dmitry Mikushin <dmitry at kernelgen.org> wrote: > A tool of this kind here: https://github.com/apc-llc/nvcc-llvm-ir > > 2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

(+llvm-dev) My question was whether your host machine, the one which is running the compiler, is ARM (as opposed to x86 or POWER). The header you pointed to was in "aarch64-linux-gnu", which made me think you might be on an ARM system. If you are not running linux x86, it is not likely to work. If you are running linux x86, we will need much more details about your system in order to

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Yes, I followed the guide. The same error showed up: >clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread error: unable to create target: 'No available targets are compatible with this triple.' ________________________________ From: Kevin Choi <code.kchoi at gmail.com> Sent: Wednesday, August 2,

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

Hi, it looks like you're compiling CUDA for an ARM host? This is not a configuration we have tested, nor is it something we have the capability of testing at the moment. You may be able to make it work by providing the appropriate -isystem flags to clang so that it can find your headers, but who knows, it may be more complicated than that. Regards, -Justin On Wed, Oct 26, 2016 at 9:59 PM,

BLAS optimization by CUBLAS

2011 Feb 10

BLAS optimization by CUBLAS

Dear colleagues! In early 2009 there was a discussion about fast BLAS library initiated by Sachin. He reported a faster BLAS library made by Nvidia CUBLAS library. Uwe Ligges showed an interest for placing the optimized rblas.dll into windows/contrib section managed by him. Unfortunately there is no any CUBLAS version of rblas.dll in this section at present. So, is anybody interested in CUBLAS

OrcJIT + CUDA Prototype for Cling

2017 Sep 27

OrcJIT + CUDA Prototype for Cling

Dear LLVM-Developers and Vinod Grover, we are trying to extend the cling C++ interpreter (https://github.com/root-project/cling) with CUDA functionality for Nvidia GPUs. I already developed a prototype based on OrcJIT and am seeking for feedback. I am currently a stuck with a runtime issue, on which my interpreter prototype fails to execute kernels with a CUDA runtime error. === How to use the

similar to: making cuda-based R package CRAN friendly