thr3ads.net - similar to: "Separate compilation of CUDA code?"

Displaying 20 results from an estimated 3000 matches similar to: "Separate compilation of CUDA code?"

2017 Jun 14

Separate compilation of CUDA code?

Hi, I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this? Thanks! Yuanfeng Peng -------------- next part -------------- An HTML attachment was scrubbed... URL:

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

Hi all, I compiled the *llvm3.9* source code on the *Nvidia TX1* board. And now I am following the document in the docs/CompileCudaWithLLVM.rst to compile cuda program with clang++. However, when I compile `axpy.cu` using `nvcc`, *nvcc* can generate the correct the binary; while compiling `axpy.cu` using clang++, the detailed command is `clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_53

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Hi, I have trouble compiling CUDA code with Clang. The following is a command I tried: > clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda The error message is error: unable to create target: 'No available targets are compatible with this triple.' The info of the LLVM I'm using is as follows: > lang++ --version clang version 6.0.0

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

(+llvm-dev) My question was whether your host machine, the one which is running the compiler, is ARM (as opposed to x86 or POWER). The header you pointed to was in "aarch64-linux-gnu", which made me think you might be on an ARM system. If you are not running linux x86, it is not likely to work. If you are running linux x86, we will need much more details about your system in order to

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Yes, I followed the guide. The same error showed up: >clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread error: unable to create target: 'No available targets are compatible with this triple.' ________________________________ From: Kevin Choi <code.kchoi at gmail.com> Sent: Wednesday, August 2,

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to

Debug info for CUDA code

2018 Dec 14

Debug info for CUDA code

Are you planning to release this as soon as it's ready or you want to make it into a major release? Is it possible to let me know (maybe by replying to this thread) once the code is ready? I know sometimes it takes a while to get things in the major release. I greatly appreciate your work on this! Thanks, Char 在 2018-12-15 05:19:50，"Alexey Bataev" <a.bataev at outlook.com>

Debug info for CUDA code

2019 Jan 23

Debug info for CUDA code

Hi Char, I found the problem, for some reason the last patch was applied correctly. Just committed the fixed version. Tried to compile axpy.cu, everything works. ------------- Best regards, Alexey Bataev 23.01.2019 13:37, treinz пишет: > Hi Alexey, > > I tried the b7195a6 from the llvm github mirror, which does include > your commit D46189 <https://reviews.llvm.org/D46189> (see

Compiling CUDA with clang on Windows

2018 May 01

Compiling CUDA with clang on Windows

Dear all, In the official document <https://llvm.org/docs/CompileCudaWithLLVM.html>, it is mentioned that CUDA compilation is supported on Windows as of 2017-01-05. I used msys2 to install clang 5.0.1. Then I installed cuda 8.0. However, I basically could not compile any code of cuda by the prescribed setting. I wounder if anyone can successfully compile cuda code by the clang on Windows.

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

Hi, it looks like you're compiling CUDA for an ARM host? This is not a configuration we have tested, nor is it something we have the capability of testing at the moment. You may be able to make it work by providing the appropriate -isystem flags to clang so that it can find your headers, but who knows, it may be more complicated than that. Regards, -Justin On Wed, Oct 26, 2016 at 9:59 PM,

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

Hi Justin, Thanks for your response! The clang & llvm I'm using was built from source. Below is the output of compiling with -v. Any suggestions would be appreciated! *clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)* *Target: x86_64-unknown-linux-gnu* *Thread model: posix* *InstalledDir: /usr/local/bin* *Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

Hi, Yuanfeng. What version of clang are you using? CUDA is only known to work at tip of head, so you must build clang yourself from source. I suspect that's your problem, but if building from source doesn't fix it, please attach the output of compiling with -v. Regards, -Justin On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote: > Directly

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

2008 Nov 04

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

Hello, I'm trying to combine the parallel computing power available through NVIDIA CUDA (www.nvidia.com/cuda) from within R. CUDA is an extension to the C language, so I thought it would be possible to do this. If I have a C file with an empty function which includes a needed CUDA library (cutil.h) and compile this to an .so file using a NVIDIA compiler (nvcc), called 'myFunc.so' I

NVPTX Back-end: relocatable device code support for dynamic parallelism

2017 Jun 09

NVPTX Back-end: relocatable device code support for dynamic parallelism

Hi everyone, CUDA allows to call some runtime functions also from the device code. On a multi-GPU system this allows the GPU to determine its device id on its own via cudaGetDevice(). Unfortunately i cannot get it working when compiling with clang. When compiling with nvcc relocatable device code needs to be set to true (-rdc=true) and the cudadevrt is needed when linking [0]. I did not

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you

Executing OpenMP 4.0 code on Nvidia's GPU

2016 Jan 20

Executing OpenMP 4.0 code on Nvidia's GPU

Hi Arpith, That is exactly what it is :). My bad, I thought I copied over the libraries to where LIBRARY_PATH pointing but apparently it was copied to a wrong destination. Thanks a lot. On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com> wrote: > Hi Ahmed, > > nvlink is unable to find the GPU OMP runtime library in its path. Does > LIBRARY_PATH point to

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

It's hard to tell what is wrong without a concrete example. E.g., what is the program you are instrumenting? What is the definition of the hook function? How did you link that definition with the binary? One thing suspicious to me is that you may have linked the definition of _Cool_MemRead_Hook as a host function instead of a device function. AFAIK, PTX assembly cannot be linked. So, if you

similar to: Separate compilation of CUDA code?