Displaying 20 results from an estimated 3000 matches similar to: "[CUDA] Lost debug information when compiling CUDA code"
2016 Mar 05
2
instrumenting device code with gpucc
On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com>
wrote:
> Hi Jingyue,
>
> My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother
> you, but I'm having trouble with gpucc in my project, and I would be really
> grateful for your help!
>
> Currently we're trying to instrument CUDA code using LLVM 3.9, and
2016 Mar 13
2
instrumenting device code with gpucc
Hey Jingyue,
Thanks for being so responsive! I finally figured out a way to resolve the
issue: all I have to do is to use `-only-needed` when merging the device
bitcodes with llvm-link.
However, since we actually need to instrument the host code as well, I
encountered another issue when I tried to glue the instrumented host code
and fatbin together. When I only instrumented the device code, I
2016 Mar 15
2
instrumenting device code with gpucc
Hi Jingyue,
Sorry to ask again, but how exactly could I glue the fatbin with the
instrumented host code? Or does it mean we actually cannot instrument both
the host & device code at the same time?
Thanks!
yuanfeng
On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote:
> Including fatbin into host code should be done in frontend.
>
> On Mon, Mar 14, 2016
2016 Mar 10
4
instrumenting device code with gpucc
It's hard to tell what is wrong without a concrete example. E.g., what is
the program you are instrumenting? What is the definition of the hook
function? How did you link that definition with the binary?
One thing suspicious to me is that you may have linked the definition of
_Cool_MemRead_Hook as a host function instead of a device function. AFAIK,
PTX assembly cannot be linked. So, if you
2017 Sep 27
2
OrcJIT + CUDA Prototype for Cling
Dear LLVM-Developers and Vinod Grover,
we are trying to extend the cling C++ interpreter
(https://github.com/root-project/cling) with CUDA functionality for
Nvidia GPUs.
I already developed a prototype based on OrcJIT and am seeking for
feedback. I am currently a stuck with a runtime issue, on which my
interpreter prototype fails to execute kernels with a CUDA runtime error.
=== How to use the
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue,
Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect
anchor didn't go away; ptxas is still complaining about the duplicate
definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused
the nvvm-reflect pass?
Thanks!
yuanfeng
On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:
> According to the examples you
2016 Aug 01
3
[GPUCC] link against libdevice
OK, I see the problem. You were right that we weren't picking up libdevice.
CUDA 7.0 only ships with the following libdevice binaries (found
/path/to/cuda/nvvm/libdevice):
libdevice.compute_20.10.bc libdevice.compute_30.10.bc
libdevice.compute_35.10.bc
If you ask for sm_50 with cuda 7.0, clang can't find a matching
libdevice binary, and it will apparently silently give up and try to
2017 Nov 14
1
OrcJIT + CUDA Prototype for Cling
Hi Lang,
thank You very much. I've used Your code and the creating of the object
file works. I think the problem is after creating the object file. When
I link the object file with ld I get an executable, which is working right.
After changing the clang and llvm libraries from the package control
version (.deb) to a own compiled version with debug options, I get an
assert() fault.
In
void
2017 Feb 07
2
Clang option to provide list of target-subarchs.
There are at least four clang frontends for offloading to accelerators:
1 Cuda clang 2 OpenMP 3 HCC and 4 OpenCL. These frontends will
want to embed object code for multiple offload targets into a single
application binary to provide portability across different subarchitectures
(e.g. sm_35, sm_50) and across different architectures (e.g nvptx64,amdgcn).
Problem: Different frontends
2016 Aug 01
0
[GPUCC] link against libdevice
Hi Justin,
Thanks for your response! The clang & llvm I'm using was built from
source.
Below is the output of compiling with -v. Any suggestions would be
appreciated!
*clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)*
*Target: x86_64-unknown-linux-gnu*
*Thread model: posix*
*InstalledDir: /usr/local/bin*
*Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*
2016 Aug 01
2
[GPUCC] link against libdevice
Hi, Yuanfeng.
What version of clang are you using? CUDA is only known to work at
tip of head, so you must build clang yourself from source.
I suspect that's your problem, but if building from source doesn't fix
it, please attach the output of compiling with -v.
Regards,
-Justin
On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Directly
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
Hi everyone,
CUDA allows to call some runtime functions also from the device code. On
a multi-GPU system this allows the GPU to determine its device id on its
own via cudaGetDevice().
Unfortunately i cannot get it working when compiling with clang. When
compiling with nvcc relocatable device code needs to be set to true
(-rdc=true) and the cudadevrt is needed when linking [0]. I did not
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Yes, I followed the guide. The same error showed up:
>clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread
error: unable to create target: 'No available targets are compatible with this triple.'
________________________________
From: Kevin Choi <code.kchoi at gmail.com>
Sent: Wednesday, August 2,
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Hi,
I have trouble compiling CUDA code with Clang. The following is a command I tried:
> clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda
The error message is
error: unable to create target: 'No available targets are compatible with this triple.'
The info of the LLVM I'm using is as follows:
> lang++ --version
clang version 6.0.0
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example.
nvcc sync.cu -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;
bra.uni
2017 Jun 14
2
Separate compilation of CUDA code?
Hi,
I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this?
Thanks!
Yuanfeng Peng
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2020 Nov 17
2
JIT compiling CUDA source code
We have an application that allows the user to compile and execute C++ code
on the fly, using Orc JIT v2, via the LLJIT class. And we would like to
extend it to allow the user to provide CUDA source code as well, for GPU
programming. But I am having a hard time figuring out how to do it.
To JIT compile C++ code, we do basically as follows:
1. call Driver::BuildCompilation(), which returns a
2014 Jun 03
1
cuda-memcheck to debug CUDA-enabled R packages
I'm building a simple R extension around a CUDA-enabled dynamic library, and
I want to run the whole package with cuda-memcheck for debugging purposes. I
can run it just fine with Valgrind:
$ R --no-save -d valgrind < test.R
However, if I try the same thing with cuda-memcheck,
$ R --no-save -d cuda-memcheck < test.R
I get:
*** Further command line arguments ('--no-save ')
2020 Sep 24
2
cuda __shfl_sync problem
Hi,
First of all, i'm not sure if i should be posting this here or in
cfe-dev, but here it goes.
In order to instrument CUDA kernels i first generate device IR with:
clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o
device.bc
I also have a library that contains the instrumentation stubs for which
i generate IR similarly and i link it with the device IR
2018 May 01
3
Compiling CUDA with clang on Windows
Dear all,
In the official document <https://llvm.org/docs/CompileCudaWithLLVM.html>,
it is mentioned that CUDA compilation is supported on Windows as of
2017-01-05. I used msys2 to install clang 5.0.1. Then I installed cuda 8.0.
However, I basically could not compile any code of cuda by the prescribed
setting. I wounder if anyone can successfully compile cuda code by the
clang on Windows.