thr3ads.net - similar to: "Separate compilation of CUDA code?"

Displaying 20 results from an estimated 1000 matches similar to: "Separate compilation of CUDA code?"

NVPTX Back-end: relocatable device code support for dynamic parallelism

2017 Jun 09

NVPTX Back-end: relocatable device code support for dynamic parallelism

Hi everyone, CUDA allows to call some runtime functions also from the device code. On a multi-GPU system this allows the GPU to determine its device id on its own via cudaGetDevice(). Unfortunately i cannot get it working when compiling with clang. When compiling with nvcc relocatable device code needs to be set to true (-rdc=true) and the cudadevrt is needed when linking [0]. I did not

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated

Separate compilation of CUDA code?

2017 Jun 17

Separate compilation of CUDA code?

Hi, I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this? Thanks! Yuanfeng Peng -------------- next part -------------- An HTML attachment was scrubbed... URL:

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

Hi Justin, Thanks for your response! The clang & llvm I'm using was built from source. Below is the output of compiling with -v. Any suggestions would be appreciated! *clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)* *Target: x86_64-unknown-linux-gnu* *Thread model: posix* *InstalledDir: /usr/local/bin* *Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

Hi, Yuanfeng. What version of clang are you using? CUDA is only known to work at tip of head, so you must build clang yourself from source. I suspect that's your problem, but if building from source doesn't fix it, please attach the output of compiling with -v. Regards, -Justin On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote: > Directly

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

Hi all, I compiled the *llvm3.9* source code on the *Nvidia TX1* board. And now I am following the document in the docs/CompileCudaWithLLVM.rst to compile cuda program with clang++. However, when I compile `axpy.cu` using `nvcc`, *nvcc* can generate the correct the binary; while compiling `axpy.cu` using clang++, the detailed command is `clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_53

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

2008 Nov 04

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

Hello, I'm trying to combine the parallel computing power available through NVIDIA CUDA (www.nvidia.com/cuda) from within R. CUDA is an extension to the C language, so I thought it would be possible to do this. If I have a C file with an empty function which includes a needed CUDA library (cutil.h) and compile this to an .so file using a NVIDIA compiler (nvcc), called 'myFunc.so' I

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

LLVM/CLANG: CUDA compilation fail for inline assembly code

2016 Oct 14

LLVM/CLANG: CUDA compilation fail for inline assembly code

Hi, I am sorry for sending this query again here, but maybe I sent it to wrong list yesterday. I am trying to compile LonestarGPU-rev2.0 <http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu/download> benchmark suite with LLVM/CLANG. This suite has a following piece of code (more info here

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

On Wed, Apr 8, 2015 at 10:12 AM, Dmitry Mikushin <dmitry at kernelgen.org> wrote: > A tool of this kind here: https://github.com/apc-llc/nvcc-llvm-ir > > 2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA

CUDA tools?

2017 Oct 05

CUDA tools?

Hi, again. So, kmod-nvidia installed. Trouble is, I have no tool to test it. And my user might need nvcc, which, of course, is only provided by the NVidia CUDA, which won't install, because it conflicts with kmod-nvidia. Has *anyone* dealt with this? If so, what was your solution? mark

[LLVMdev] Cuda programs on LLVM

2011 Aug 15

[LLVMdev] Cuda programs on LLVM

Hello , How to execute a cuda program using llvm? More specifically, nvcc produces some temporary files during its compilation. I want to convert the .cu.cpp to .ll format and optimize it. The .cu.cpp file contains typedefs and enums used by cuda runtime and also the host part of the code and the ptx file contains the kernel definition. How can i run the program after optimization? Will Rhodin

[LLVMdev] translating from OpenMP to CUDA

2012 Nov 08

[LLVMdev] translating from OpenMP to CUDA

Hi, Is it possible to translate an OpenMP program to CUDA using LLVM? I read that dragonegg has a OpenMP front-end and LLVM has a PTX back-end. I don't know how mature these tools are. Please let me know. Thanks. -Apala Postdoctoral Scholar Department of Computer Science, University of Chicago Computation Institute, Argonne National Laboratory http://sites.google.com/site/apalaguha/home/

question about Makeconf and nvcc/CUDA

2013 Jul 18

question about Makeconf and nvcc/CUDA

Dear R development: I'm not sure if this is the appropriate list, but it's a start. I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory. But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE.

[LLVMdev] translating from OpenMP to CUDA

2012 Nov 09

[LLVMdev] translating from OpenMP to CUDA

The PTX back-end is robust (it's based on the sources used by nvcc), but I'm not sure about the OpenMP representation in LLVM IR. I believe the OpenMP constructs are already lowered into libgomp calls before leaving DragonEgg. It's been awhile since I've loooked at it though. If you use the PTX back-end and have any issues, please don't hesitate to post to the list and cc:

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you

similar to: Separate compilation of CUDA code?