search for: sm_35

Displaying 16 results from an estimated 16 matches for "sm_35".

Did you mean: sm_30
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
...needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated in the previous step), inserting a call to a hook function before each device memory access. The hook function is defined in another file, b.cu. Let's say we get a file named intrumented-a-device.bc after th...
2017 Feb 07
2
Clang option to provide list of target-subarchs.
There are at least four clang frontends for offloading to accelerators: 1 Cuda clang 2 OpenMP 3 HCC and 4 OpenCL. These frontends will want to embed object code for multiple offload targets into a single application binary to provide portability across different subarchitectures (e.g. sm_35, sm_50) and across different architectures (e.g nvptx64,amdgcn). Problem: Different frontends are using different flags to provide a list of subarchitectures. For example, cuda clang repeats the flag “--cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_50” and HCC uses “--amdgpu-target=gfx701 --amdgpu...
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
...rary "/opt/cuda-8.0/bin/../nvvm/libdevice/libdevice.compute_35.10.bc" --device-c --orig_src_file_name "../testApps/cuda_id_test.cu" "/tmp/tmpxft_00007040_00000000-13_cuda_id_test.cpp3.i" -o "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx" #$ ptxas -arch=sm_35 -m64 --compile-only "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx" -o "/tmp/tmpxft_00007040_00000000-14_cuda_id_test.sm_35.cubin" #$ fatbinary --create="/tmp/tmpxft_00007040_00000000-2_cuda_id_test.fatbin" -64 --cmdline="--compile-only " "--image...
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Yes, I followed the guide. The same error showed up: >clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread error: unable to create target: 'No available targets are compatible with this triple.' ________________________________ From: Kevin Choi <code.kchoi at gmail.com> Sent: Wednesday, August 2, 2017 3...
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Hi, I have trouble compiling CUDA code with Clang. The following is a command I tried: > clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda The error message is error: unable to create target: 'No available targets are compatible with this triple.' The info of the LLVM I'm using is as follows: > lang++ --version clang version 6.0.0 (http://llvm.org/git/clang.git 16a0981eccf1bfcc9ba9287...
2016 Mar 05
2
instrumenting device code with gpucc
...-4 by invoking subcommands. Therefore, you can use "clang -###" to dump all the subcommands, and then find the ones for step 2-4. For example, $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread --cuda-gpu-arch=sm_35 clang version 3.9.0 (http://llvm.org/git/clang.git 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git 2550ef485b6f9668bb7a4daa7ab276b6501492df) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/google/home/jingyue/Work/llvm/install/bin "/usr/l...
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2;...
2016 Mar 10
4
instrumenting device code with gpucc
...t;> "clang -###" to dump all the subcommands, and then find the ones for step >> 2-4. For example, >> >> $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc >> -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread >> --cuda-gpu-arch=sm_35 >> >> clang version 3.9.0 (http://llvm.org/git/clang.git >> 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git >> 2550ef485b6f9668bb7a4daa7ab276b6501492df) >> Target: x86_64-unknown-linux-gnu >> Thread model: posix >> InstalledDir: /usr...
2018 May 01
3
Compiling CUDA with clang on Windows
Dear all, In the official document <https://llvm.org/docs/CompileCudaWithLLVM.html>, it is mentioned that CUDA compilation is supported on Windows as of 2017-01-05. I used msys2 to install clang 5.0.1. Then I installed cuda 8.0. However, I basically could not compile any code of cuda by the prescribed setting. I wounder if anyone can successfully compile cuda code by the clang on Windows.
2016 Aug 01
3
[GPUCC] link against libdevice
...l whether that's safe in general. I'll look into this as well. Anyway if you build with CUDA 7.5 your problem should go away, because CUDA 7.5 has a libdevice binary for compute_50. Just pass --cuda-path=/path/to/cuda-7.5. Alternatively you could continue building with cuda 7.0 and pass sm_35 as your gpu arch. clang always embeds ptx in the binaries, so the result should still run on your sm_50 card (although your machine will have to jit the ptx on startup). As a third alternative, you could symlink your libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe that would...
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...
2016 Aug 01
0
[GPUCC] link against libdevice
Hi Justin, Thanks for your response! The clang & llvm I'm using was built from source. Below is the output of compiling with -v. Any suggestions would be appreciated! *clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)* *Target: x86_64-unknown-linux-gnu* *Thread model: posix* *InstalledDir: /usr/local/bin* *Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*
2013 Jul 18
2
question about Makeconf and nvcc/CUDA
Dear R development: I'm not sure if this is the appropriate list, but it's a start. I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory. But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE.
2016 Jan 20
4
Executing OpenMP 4.0 code on Nvidia's GPU
Hi Arpith, That is exactly what it is :). My bad, I thought I copied over the libraries to where LIBRARY_PATH pointing but apparently it was copied to a wrong destination. Thanks a lot. On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com> wrote: > Hi Ahmed, > > nvlink is unable to find the GPU OMP runtime library in its path. Does > LIBRARY_PATH point to
2016 Aug 01
2
[GPUCC] link against libdevice
Hi, Yuanfeng. What version of clang are you using? CUDA is only known to work at tip of head, so you must build clang yourself from source. I suspect that's your problem, but if building from source doesn't fix it, please attach the output of compiling with -v. Regards, -Justin On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote: > Directly
2015 Jun 09
2
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Sergos and Samuel, Thanks for the links, I've got it mostly working now. I still have a problem with linking the code. It seems that the clang driver doesn't pass its library search path to nvlink when linking the generated cuda code to the target library, resulting in it not correctly finding libtarget-nvptx.a. Is there some flag or environment variable that I should set here?