similar to: [GPUCC] link against libdevice

Displaying 20 results from an estimated 300 matches similar to: "[GPUCC] link against libdevice"

2016 Aug 01
0
[GPUCC] link against libdevice
Directly CC-ing some folks who may be able to help. On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > I was trying to compile scalarProd.cu (from CUDA SDK) with the following > command: > > * clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc > --cuda-gpu-arch=sm_50 scalarProd.cu* > > but ended up with
2016 Aug 01
3
[GPUCC] link against libdevice
OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to
2016 Aug 01
2
[GPUCC] link against libdevice
Hi, Yuanfeng. What version of clang are you using? CUDA is only known to work at tip of head, so you must build clang yourself from source. I suspect that's your problem, but if building from source doesn't fix it, please attach the output of compiling with -v. Regards, -Justin On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote: > Directly
2016 Aug 01
0
[GPUCC] link against libdevice
Hi Justin, Thanks for your response! The clang & llvm I'm using was built from source. Below is the output of compiling with -v. Any suggestions would be appreciated! *clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)* *Target: x86_64-unknown-linux-gnu* *Thread model: posix* *InstalledDir: /usr/local/bin* *Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*
2016 Mar 05
2
instrumenting device code with gpucc
On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote: > Hi Jingyue, > > My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother > you, but I'm having trouble with gpucc in my project, and I would be really > grateful for your help! > > Currently we're trying to instrument CUDA code using LLVM 3.9, and
2016 Mar 10
4
instrumenting device code with gpucc
It's hard to tell what is wrong without a concrete example. E.g., what is the program you are instrumenting? What is the definition of the hook function? How did you link that definition with the binary? One thing suspicious to me is that you may have linked the definition of _Cool_MemRead_Hook as a host function instead of a device function. AFAIK, PTX assembly cannot be linked. So, if you
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you
2016 Apr 08
2
[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?
Yeah, '.' is the direct reason for the ptxas failure here. I'm curious, however, about what the purpose of nvvm_reflect_anchorv() is here, and why does the front-end always generate this function? Since the current PTX emission doesn't mangle dots, it would be a reasonable workaround for me to prevent the front-end from generating this function in the first place. Is there any
2016 Apr 07
2
[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?
Hi, I needed to compile a cuda source file (say, a.cu) into IR (a.bc), and then merge a.bc with another bitcode file (b.bc, compiled from b.cu). So I used *llvm-link a.bc b.bc -o c.bc* However, I noticed that an internal function '* _ZL21__nvvm_reflect_anchorv() *' is defined in both a.bc & b.bc, and when merging these two files, one of the two definitions was renamed to
2016 Apr 09
2
[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?
David's change makes nvvm_reflect_anchor unnecessary. The issue with dots in names generated by llvm still needs to be fixed. On Apr 9, 2016 8:32 AM, "Jingyue Wu" <jingyue at google.com> wrote: > Artem, > > With David's http://reviews.llvm.org/rL265060, do you think > __nvvm_reflect_anchor is still necessary? > > On Fri, Apr 8, 2016 at 9:37 AM, Yuanfeng
2016 Mar 13
2
instrumenting device code with gpucc
Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I
2016 Mar 15
2
instrumenting device code with gpucc
Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016
2019 Nov 13
2
AMDGPU and math functions
Does anyone know whether there is yet support for math functions in AMD GPU kernels? In the NVIDIA world they provide the libdevice IR module which can be linked to an existing module containing the kernel. In other words they provide all math functions on IR level. NVIDIA even claims that libdevice is actually device specific (compute capability). I was wondering how that is done on the AMD
2019 Nov 13
2
AMDGPU and math functions
There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda). AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed. AMD device libraries are mostly written in (OpenCL) C and open source at https://github.com/RadeonOpenCompute/ROCm-Device-Libs . They are configured by linking in a number tiny
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated
2013 Jun 05
2
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Thanks for the info! I would be glad to hear of any issues you have encountered on this path. I tried to make sure the 3.3 release was fully compatible with the libdevice implementation shipping with 5.5 (and as far as I know, it is). It's just not an officially supported configuration. Also, I've been meaning to address your -drvcuda issue. How would you feel about making that a part
2013 Jun 05
0
[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all
Dear all, FWIW, I've tested libdevice.compute_20.10.bc and libdevice.compute_30.10.bc from /cuda/nvvm/libdevice shipped with CUDA 5.5 preview. IR is compatible with LLVM 3.4 trunk that we use. Results are correct, performance - almost the same as what we had before with cicc-sniffed IR, or maybe <10% better. Will test libdevice.compute_35.10.bc once we will get K20 support. Thanks for
2017 Feb 07
2
Clang option to provide list of target-subarchs.
There are at least four clang frontends for offloading to accelerators: 1 Cuda clang 2 OpenMP 3 HCC and 4 OpenCL. These frontends will want to embed object code for multiple offload targets into a single application binary to provide portability across different subarchitectures (e.g. sm_35, sm_50) and across different architectures (e.g nvptx64,amdgcn). Problem: Different frontends
2017 Jun 14
2
Separate compilation of CUDA code?
Hi, I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this? Thanks! Yuanfeng Peng -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
Hi everyone, CUDA allows to call some runtime functions also from the device code. On a multi-GPU system this allows the GPU to determine its device id on its own via cudaGetDevice(). Unfortunately i cannot get it working when compiling with clang. When compiling with nvcc relocatable device code needs to be set to true (-rdc=true) and the cudadevrt is needed when linking [0]. I did not