search for: nvcc

Displaying 20 results from an estimated 75 matches for "nvcc".

Did you mean: nvc0
2006 Nov 09
4
Running a 32-bit application on CentOS3-x64
Hi, I'm trying to run Norman anti-virus on a CentOS 3 box, x64. Is it possible? Running the binary gives me this error: [root at server bin]# ./nvcc -bash: ./nvcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory I guess I would have to install i386 libraries that it requires, as well. It it possible? Regards, Ugo
2017 Jun 14
2
Separate compilation of CUDA code?
Hi, I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this? Thanks! Yuanfeng Peng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/1865e072/attachment.html>
2018 Feb 20
2
use clang++ to build lulesh 2.0 failed
Hello, I'm trying to use clang++ instead of nvcc to build lulesh 2.0 cuda version. And it fails in the compilation with errors like the one below: opt/common/cuda/cuda-7.5.18/include/thrust/iterator/iterator_adaptor.h:187:5: error: expected member name or ';' after declaration specifiers __thrust_exec_check_disable__ ​It looks like...
2008 Nov 04
1
Help needed using 3rd party C library/functions from within R (Nvidia CUDA)
...available through NVIDIA CUDA (www.nvidia.com/cuda) from within R. CUDA is an extension to the C language, so I thought it would be possible to do this. If I have a C file with an empty function which includes a needed CUDA library (cutil.h) and compile this to an .so file using a NVIDIA compiler (nvcc), called 'myFunc.so' I can load this fine from within R with dyn.load("myFunc.so"). But, as soon as I want to call it's function I get: > dyn.load("myFunc.so") > .C("testFunc") Error in .C("testFunc") : C symbol name "testFunc" no...
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else...
2016 Oct 27
3
problem on compiling cuda program with clang++
Hi all, I compiled the *llvm3.9* source code on the *Nvidia TX1* board. And now I am following the document in the docs/CompileCudaWithLLVM.rst to compile cuda program with clang++. However, when I compile `axpy.cu` using `nvcc`, *nvcc* can generate the correct the binary; while compiling `axpy.cu` using clang++, the detailed command is `clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_53 -L/usr/local/cuda/lib64 -lcudart_static -ldl -lrt -pthread`, *clang++* generate the following error:`/usr/include/features.h:367:12: fatal e...
2018 Feb 20
0
use clang++ to build lulesh 2.0 failed
> ​It looks like clang++ is complaining about the thrust library comes with cuda, The Thrust library that comes with CUDA is indeed not compatible with clang. We made a number of changes to Thrust to make it work with clang (it was relying on what we considered to be bugs in nvcc), but they're only available in the upstream Thrust: https://github.com/thrust/thrust. No promises that one builds with Clang either, but at least it should. On Tue, Feb 20, 2018 at 1:24 PM Hui Zhang via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hello, > > I'm trying...
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra...
2012 Sep 04
2
[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant
...); } - else if (isa<ConstantAggregateZero>(CPV)) - aggBuffer->addZeros(Bytes); - else - llvm_unreachable("Unexpected Constant type"); break; } It it OK, how do you think? Thanks, - D. 2012/9/4 Dmitry N. Mikushin <maemarcus at gmail.com>: > NVCC successfully handles the same IR, if we try to process the same > .cu file with clang+nvptx and nvcc: > > CLANG/NVPTX: > ============= > > $ cat dayofweek.cu > __attribute__((device)) char yweek[7][4] = { "MON", "TUE", "WED", > "THU",...
2012 Sep 04
0
[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant
NVCC successfully handles the same IR, if we try to process the same .cu file with clang+nvptx and nvcc: CLANG/NVPTX: ============= $ cat dayofweek.cu __attribute__((device)) char yweek[7][4] = { "MON", "TUE", "WED", "THU", "FRI", "SAT", &quo...
2013 Jul 18
2
question about Makeconf and nvcc/CUDA
Dear R development: I'm not sure if this is the appropriate list, but it's a start. I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory. But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE. Here are the results of the check command: c:\PROGRA~1\R\R-3.0.1\bin\i386>R CMD check cudasize R CMD check cudasize * using log directory 'c:/PROGRA~1/R/R-3.0.1/bin/i386/c...
2018 Jun 21
2
NVPTX - Reordering load instructions
Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > &g...
2012 Sep 06
0
[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant
...- else > - llvm_unreachable("Unexpected Constant type"); > break; > } > > It it OK, how do you think? What is the type of CPV that you are seeing? > > Thanks, > - D. > > 2012/9/4 Dmitry N. Mikushin <maemarcus at gmail.com>: >> NVCC successfully handles the same IR, if we try to process the same >> .cu file with clang+nvptx and nvcc: >> >> CLANG/NVPTX: >> ============= >> >> $ cat dayofweek.cu >> __attribute__((device)) char yweek[7][4] = { "MON", "TUE", "WED&q...
2012 Sep 03
2
[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant
Dear all, Looks like the NVPTX backend cannot handle array-of-arrays contant (please see the reporocase below). Is it supposed to work? Any ideas how to get it working? Important for our target applications. Thanks, - Dima. $ cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout =
2016 Oct 14
2
LLVM/CLANG: CUDA compilation fail for inline assembly code
...ia.com/default/topic/481465/cuda-programming-and-performance/any-way-to-know-on-which-sm-a-thread-is-running-/2/?offset=21#4996171> ): - static __device__ uint get_smid(void) { - uint ret; - asm("mov.u32 %0, %smid;" : "=r"(ret) ); - return ret; - } The original make file has nvcc compiler with a flag -Xptxas -v. It compiles with nvcc. LLVM has -Xcuda-ptxas <arg>, which I believe is the comparable command for compiling PTX code. I get following error when I try compiling (clang 4.0).: 1. ../../include/cutil_subset.h:23:25: error: invalid % escape in inline ass...
2016 Oct 27
0
problem on compiling cuda program with clang++
...t 27, 2016 at 11:49 AM, 李阳 <liyang.cs.cqu at gmail.com> wrote: > Yes, I am following the document `CompileCUDAWithLLVM.rst`. > > And I want to use the `clang++` to compile the CUDA program to generate > intermediate representation(IR). However what I have been trying is that > `nvcc` can compile correctly and `clang++` cannot. > > I dont know whether the `nvcc` can generate the IR code or PTX level > instruction? Thanks! > > 2016-10-28 2:02 GMT+08:00 Justin Lebar <jlebar at google.com>: >> >> Hi, it looks like you're compiling CUDA for an A...
2015 Apr 08
5
[LLVMdev] CUDA front-end (CUDA to LLVM IR)
Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was
2018 Jun 21
2
NVPTX - Reordering load instructions
...e. Justin On Thu, Jun 21, 2018, 7:48 AM Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > On 06/21/2018 12:18 PM, Tim Besard via llvm-dev wrote: > > Hi all, > > > > I'm looking into the performance difference of a benchmark compiled with > > NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a > > significant difference due to PTX instruction ordering. The relevant > > source code consists of two nested loops that get fully unrolled, doing > > some basic arithmetic with values loaded from shared memory: >...
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
Hi everyone, CUDA allows to call some runtime functions also from the device code. On a multi-GPU system this allows the GPU to determine its device id on its own via cudaGetDevice(). Unfortunately i cannot get it working when compiling with clang. When compiling with nvcc relocatable device code needs to be set to true (-rdc=true) and the cudadevrt is needed when linking [0]. I did not found such switches to turn rdc for clang. Just compiling does not work as ptxas does not find the function cudaGetDevice(). My guess is, that this feature is not supported. Does...
2020 Mar 31
2
Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Hi Johannes: 1. Attached is the submitted PDF. 2. I have a notes section where I state: I am still unsure of the GPU extension I proposed as I dont know how LLVM plays into the GPU cross over space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.I dont know if there is a chance that function graphs in the CPU+GPU name spaces are seamless/continupus within nvcc or if nvcc is just a wrapper that invokes gcc on the cpu sources and ptx on the gpu sources. So what I have said is - if there is...