thr3ads.net - similar to: "use clang++ to build lulesh 2.0 failed"

Displaying 20 results from an estimated 1000 matches similar to: "use clang++ to build lulesh 2.0 failed"

2018 Feb 20

use clang++ to build lulesh 2.0 failed

> It looks like clang++ is complaining about the thrust library comes with cuda, The Thrust library that comes with CUDA is indeed not compatible with clang. We made a number of changes to Thrust to make it work with clang (it was relying on what we considered to be bugs in nvcc), but they're only available in the upstream Thrust: https://github.com/thrust/thrust. No promises that one

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

Hi all, I compiled the *llvm3.9* source code on the *Nvidia TX1* board. And now I am following the document in the docs/CompileCudaWithLLVM.rst to compile cuda program with clang++. However, when I compile `axpy.cu` using `nvcc`, *nvcc* can generate the correct the binary; while compiling `axpy.cu` using clang++, the detailed command is `clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_53

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

> NVidia TX1 is the AArch64 Jetson board with proper GPU (we use those). Sure, I believe that others use this configuration. I was saying, "we", being, myself and those whom I work closely with, do not. Sorry if that wasn't precise. It is still not clear to me if the original poster is compiling for ARM or not. But it sounds like you're going to help them get this

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

On 27 October 2016 at 19:02, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi, it looks like you're compiling CUDA for an ARM host? This is not > a configuration we have tested, nor is it something we have the > capability of testing at the moment. Hi Justin, NVidia TX1 is the AArch64 Jetson board with proper GPU (we use those). > You may be able to

Running a 32-bit application on CentOS3-x64

2006 Nov 09

Running a 32-bit application on CentOS3-x64

Hi, I'm trying to run Norman anti-virus on a CentOS 3 box, x64. Is it possible? Running the binary gives me this error: [root at server bin]# ./nvcc -bash: ./nvcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory I guess I would have to install i386 libraries that it requires, as well. It it possible? Regards, Ugo

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

problem on compiling cuda program with clang++

2016 Oct 27

problem on compiling cuda program with clang++

(+llvm-dev) My question was whether your host machine, the one which is running the compiler, is ARM (as opposed to x86 or POWER). The header you pointed to was in "aarch64-linux-gnu", which made me think you might be on an ARM system. If you are not running linux x86, it is not likely to work. If you are running linux x86, we will need much more details about your system in order to

Separate compilation of CUDA code?

2017 Jun 14

Separate compilation of CUDA code?

Hi, I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc's --relocatable-device-code flag? If not, is there any plan to support this? Thanks! Yuanfeng Peng -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant

2012 Sep 03

[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant

Dear all, Looks like the NVPTX backend cannot handle array-of-arrays contant (please see the reporocase below). Is it supposed to work? Any ideas how to get it working? Important for our target applications. Thanks, - Dima. $ cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout =

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

2008 Nov 04

Help needed using 3rd party C library/functions from within R (Nvidia CUDA)

Hello, I'm trying to combine the parallel computing power available through NVIDIA CUDA (www.nvidia.com/cuda) from within R. CUDA is an extension to the C language, so I thought it would be possible to do this. If I have a C file with an empty function which includes a needed CUDA library (cutil.h) and compile this to an .so file using a NVIDIA compiler (nvcc), called 'myFunc.so' I

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant

2012 Sep 04

[LLVMdev] [NVPTX] Backend cannot handle array-of-arrays constant

I think our test case demonstrates that requiring the array item being initialized to be constant is incorrect. NVPTX does not crash anymore and produces correct result with the following change: --- NVPTXAsmPrinter.cpp 2012-09-03 15:14:00.000000000 +0200 +++ NVPTXAsmPrinter.cpp 2012-09-04 15:47:17.859398193 +0200 @@ -1890,17 +1890,15 @@ case Type::ArrayTyID: case Type::VectorTyID: case

[LLVMdev] `Ty && "Trying to add a type that doesn't exist?

2015 Jun 04

[LLVMdev] `Ty && "Trying to add a type that doesn't exist?

Upgrade clang? I can't reproduce it with trunk. On 4 June 2015 at 14:48, Hui Zhang <wayne.huizhang at gmail.com> wrote: > Yes, I found this link, but what's the solution?? > > On Thu, Jun 4, 2015 at 1:09 PM, Rafael Espíndola > <rafael.espindola at gmail.com> wrote: >> >> I think this is https://llvm.org/bugs/show_bug.cgi?id=16846 >> >> On

how to add the location debug info for each instruction

2015 Nov 04

how to add the location debug info for each instruction

> On Nov 3, 2015, at 5:00 PM, Hui Zhang <wayne.huizhang at gmail.com> wrote: > > Hello, > > I found a weird thing in llvm 3.3: > > For exactly the same MDNode *space, if I cast it to DILocation loc(space) and call loc.getFileName(), or I cast it to DIScope sco(space) and call sco.getFilename(), the return value would be different ! Totally two different files

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was

density function

2005 May 10

density function

Hi, I wonder if the function "density" outputs the gaussian mixture formula that is estimated from the input data, assuming a gaussian model is used at each data point ? I want to take the derivative of the finally estimated gaussian mixture formula for further analysis. Thanks in advance for any help that you can offer me! Hui

pad leading zeros in front of strings

2012 May 22

pad leading zeros in front of strings

Dear All, This question sounds very simple but I don't know where I am wrong. I just want to pad leading zeros in some string, for example, "123" becomes "00123". What is wrong if I do following? > sprintf("%05s", "123") [1] " 123" It didn't return "00123", instead it padded with 'blank'. Thank you for your help

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >

help with the usage of "randomForest"

2004 Mar 31

help with the usage of "randomForest"

Dear all, Can anybody give me some hint on the following error msg I got with using randomForest? I have two-class classification problem. The data file "sample" is: ---------------------------------------------------------- udomain.edu udomain.hcs hpclass 1 1.0000 1 not 2 NA 2 not 3 NA 0.8 not 4 NA 0.2 hp 5 NA 0.9 hp ------------------------------------------------------------ The

longer object length, is not a multiple of shorter object length in: kappa * gcounts

2005 Apr 18

longer object length, is not a multiple of shorter object length in: kappa * gcounts

Hi, I was using a density estimation function as follows: > est <- KernSmooth::bkde(x3, bandwidth=10) When setting bandwidth less than 5, I got the error "longer object length, is not a multiple of shorter object length in: kappa * gcounts ". I wonder if there is anybody who can explain the error for me? Thanks! Hui

similar to: use clang++ to build lulesh 2.0 failed