Displaying 16 results from an estimated 16 matches for "sm_35".
Did you mean:
sm_30
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
...needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck.
Specifically, below is what I did:
1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu;
2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated in the previous step), inserting a call to a hook function before each device memory access. The hook function is defined in another file, b.cu. Let's say we get a file named intrumented-a-device.bc after th...
2017 Feb 07
2
Clang option to provide list of target-subarchs.
There are at least four clang frontends for offloading to accelerators:
1 Cuda clang 2 OpenMP 3 HCC and 4 OpenCL. These frontends will
want to embed object code for multiple offload targets into a single
application binary to provide portability across different subarchitectures
(e.g. sm_35, sm_50) and across different architectures (e.g nvptx64,amdgcn).
Problem: Different frontends are using different flags to provide a
list of subarchitectures. For example, cuda clang repeats the flag
“--cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_50” and HCC uses
“--amdgpu-target=gfx701 --amdgpu...
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
...rary "/opt/cuda-8.0/bin/../nvvm/libdevice/libdevice.compute_35.10.bc" --device-c --orig_src_file_name "../testApps/cuda_id_test.cu" "/tmp/tmpxft_00007040_00000000-13_cuda_id_test.cpp3.i" -o "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx"
#$ ptxas -arch=sm_35 -m64 --compile-only "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx" -o "/tmp/tmpxft_00007040_00000000-14_cuda_id_test.sm_35.cubin"
#$ fatbinary --create="/tmp/tmpxft_00007040_00000000-2_cuda_id_test.fatbin" -64 --cmdline="--compile-only " "--image...
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Yes, I followed the guide. The same error showed up:
>clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread
error: unable to create target: 'No available targets are compatible with this triple.'
________________________________
From: Kevin Choi <code.kchoi at gmail.com>
Sent: Wednesday, August 2, 2017 3...
2017 Aug 02
2
CUDA compilation "No available targets are compatible with this triple." problem
Hi,
I have trouble compiling CUDA code with Clang. The following is a command I tried:
> clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda
The error message is
error: unable to create target: 'No available targets are compatible with this triple.'
The info of the LLVM I'm using is as follows:
> lang++ --version
clang version 6.0.0 (http://llvm.org/git/clang.git 16a0981eccf1bfcc9ba9287...
2016 Mar 05
2
instrumenting device code with gpucc
...-4 by invoking subcommands. Therefore, you can use "clang
-###" to dump all the subcommands, and then find the ones for step 2-4. For
example,
$ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc
-L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
--cuda-gpu-arch=sm_35
clang version 3.9.0 (http://llvm.org/git/clang.git
4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git
2550ef485b6f9668bb7a4daa7ab276b6501492df)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/google/home/jingyue/Work/llvm/install/bin
"/usr/l...
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example.
nvcc sync.cu -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;...
2016 Mar 10
4
instrumenting device code with gpucc
...t;> "clang -###" to dump all the subcommands, and then find the ones for step
>> 2-4. For example,
>>
>> $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc
>> -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
>> --cuda-gpu-arch=sm_35
>>
>> clang version 3.9.0 (http://llvm.org/git/clang.git
>> 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git
>> 2550ef485b6f9668bb7a4daa7ab276b6501492df)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr...
2018 May 01
3
Compiling CUDA with clang on Windows
Dear all,
In the official document <https://llvm.org/docs/CompileCudaWithLLVM.html>,
it is mentioned that CUDA compilation is supported on Windows as of
2017-01-05. I used msys2 to install clang 5.0.1. Then I installed cuda 8.0.
However, I basically could not compile any code of cuda by the prescribed
setting. I wounder if anyone can successfully compile cuda code by the
clang on Windows.
2016 Aug 01
3
[GPUCC] link against libdevice
...l whether that's safe
in general. I'll look into this as well.
Anyway if you build with CUDA 7.5 your problem should go away, because
CUDA 7.5 has a libdevice binary for compute_50. Just pass
--cuda-path=/path/to/cuda-7.5. Alternatively you could continue
building with cuda 7.0 and pass sm_35 as your gpu arch. clang always
embeds ptx in the binaries, so the result should still run on your
sm_50 card (although your machine will have to jit the ptx on
startup).
As a third alternative, you could symlink your
libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe
that would...
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin,
Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,
void foo() {
__syncthreads();
}
if (threadIdx.x % 2 == 0) {
...
foo();
} else {
...
foo();
}
Before inlining, all threads meet at one __syncthreads(). After inlining
if (threadIdx.x % 2 == 0) {
...
__syncthreads();
} else {
...
2016 Aug 01
0
[GPUCC] link against libdevice
Hi Justin,
Thanks for your response! The clang & llvm I'm using was built from
source.
Below is the output of compiling with -v. Any suggestions would be
appreciated!
*clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)*
*Target: x86_64-unknown-linux-gnu*
*Thread model: posix*
*InstalledDir: /usr/local/bin*
*Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*
2013 Jul 18
2
question about Makeconf and nvcc/CUDA
Dear R development:
I'm not sure if this is the appropriate list, but it's a start.
I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory.
But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE.
2016 Jan 20
4
Executing OpenMP 4.0 code on Nvidia's GPU
Hi Arpith,
That is exactly what it is :).
My bad, I thought I copied over the libraries to where LIBRARY_PATH
pointing but apparently it was copied to a wrong destination.
Thanks a lot.
On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com> wrote:
> Hi Ahmed,
>
> nvlink is unable to find the GPU OMP runtime library in its path. Does
> LIBRARY_PATH point to
2016 Aug 01
2
[GPUCC] link against libdevice
Hi, Yuanfeng.
What version of clang are you using? CUDA is only known to work at
tip of head, so you must build clang yourself from source.
I suspect that's your problem, but if building from source doesn't fix
it, please attach the output of compiling with -v.
Regards,
-Justin
On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Directly
2015 Jun 09
2
[LLVMdev] Supporting heterogeneous computing in llvm.
Hi Sergos and Samuel,
Thanks for the links, I've got it mostly working now.
I still have a problem with linking the code. It seems that the clang
driver doesn't pass its library search path to nvlink when linking the
generated cuda code to the target library, resulting in it not correctly
finding libtarget-nvptx.a. Is there some flag or environment variable
that I should set here?