Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Cuda programs on LLVM"
2011 Aug 15
0
[LLVMdev] Cuda programs on LLVM
Hi Adarsh,
to my knowledge there is no publicly available CUDA-Frontend for LLVM yet.
The work of Helge Rhodin you mentioned is on the backend-side: It allows
to generate PTX code from LLVM IR. It is still being maintained,
although I think the currently available source code is a little outdated.
There is also a PTX backend in the current version of LLVM that makes
use of LLVM's
2010 Apr 27
5
[LLVMdev] PTX target for LLVM!
Hey everybody,
good news for everyone interested in the PTX backend:
We decided to release the current source code under the GPL - you can
find the latest tarball here:
http://www.prog.uni-saarland.de/projects/anysl
You will find the README in the attachment, which should hopefully
answer a lot of questions concerning the implementation and the current
status.
If you have further questions,
2011 Aug 29
0
[LLVMdev] PTX target for LLVM!
Hi everyone,
I downloaded the latest version of LLVM PTX backend from
http://www.prog.uni-saarland.de/projects/anysl
and made the required changes to all the files mentioned in the README. But
I get the following error when I compile it.
llvm[3]: Compiling PTXBackend.cpp for Release build
In file included from PTXBackend.h:70:0,
from PTXBackend.cpp:36:
PTXPasses.h: In constructor
2016 Oct 14
2
LLVM/CLANG: CUDA compilation fail for inline assembly code
Hi,
I am sorry for sending this query again here, but maybe I sent it to wrong
list yesterday.
I am trying to compile LonestarGPU-rev2.0
<http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu/download>
benchmark suite with LLVM/CLANG.
This suite has a following piece of code (more info here
2012 Nov 08
3
[LLVMdev] translating from OpenMP to CUDA
Hi,
Is it possible to translate an OpenMP program to CUDA using LLVM? I read that dragonegg has a OpenMP front-end and LLVM has a PTX back-end. I don't know how mature these tools are. Please let me know. Thanks.
-Apala
Postdoctoral Scholar
Department of Computer Science, University of Chicago
Computation Institute, Argonne National Laboratory
http://sites.google.com/site/apalaguha/home/
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example.
nvcc sync.cu -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;
bra.uni
2012 Nov 09
0
[LLVMdev] translating from OpenMP to CUDA
The PTX back-end is robust (it's based on the sources used by nvcc), but
I'm not sure about the OpenMP representation in LLVM IR. I believe the
OpenMP constructs are already lowered into libgomp calls before leaving
DragonEgg. It's been awhile since I've loooked at it though.
If you use the PTX back-end and have any issues, please don't hesitate to
post to the list and cc:
2016 Jun 02
3
PTX generation from CUDA file for compute capability 1.0 (sm_10)
Hello Bergström/Eric,
Thanks for the reply. The G80(sm_10) architecture was ported on FPGA by a
group of researchers (http://www.ecs.umass.edu/ece/tessier/andryc-fpt13.pdf).
Our group have some further research interest on this work. I was working
on modifying the Clang-LLVM for a couple of months and achieved the
required changes. But Clang-LLVM is only allowing me to generate PTX for
sm_20,
2016 Oct 27
3
problem on compiling cuda program with clang++
Hi all,
I compiled the *llvm3.9* source code on the *Nvidia TX1* board. And now I
am following the document in the docs/CompileCudaWithLLVM.rst to compile
cuda program with clang++.
However, when I compile `axpy.cu` using `nvcc`, *nvcc* can generate the
correct the binary;
while compiling `axpy.cu` using clang++, the detailed command is `clang++
axpy.cu -o axpy --cuda-gpu-arch=sm_53
2010 Oct 07
1
[LLVMdev] Status of PTX Backend
Hi,
The PTX backend we developed (CBackend approach, does not use the target
independent code generator) is already more advanced.
An older version is published here:
http://sourceforge.net/projects/llvmptxbackend/
We recently eliminated a bug which increased the number of required
registers per thread. Surprisingly, without that bug the generated code
is already comparable to code generated
2010 Aug 10
1
[LLVMdev] PTX backend, BSD license
On Tue, 10 Aug 2010 14:21:43 -0500
"Villmow, Micah" <Micah.Villmow at amd.com> wrote:
>
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu
> > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of David A. Greene
> > Sent: Tuesday, August 10, 2010 12:05 PM
> > To: Helge Rhodin
> > Cc: llvmdev at cs.uiuc.edu
> >
2017 Sep 27
2
OrcJIT + CUDA Prototype for Cling
Dear LLVM-Developers and Vinod Grover,
we are trying to extend the cling C++ interpreter
(https://github.com/root-project/cling) with CUDA functionality for
Nvidia GPUs.
I already developed a prototype based on OrcJIT and am seeking for
feedback. I am currently a stuck with a runtime issue, on which my
interpreter prototype fails to execute kernels with a CUDA runtime error.
=== How to use the
2010 Aug 10
0
[LLVMdev] PTX backend, BSD license
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of David A. Greene
> Sent: Tuesday, August 10, 2010 12:05 PM
> To: Helge Rhodin
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] PTX backend, BSD license
>
> Helge Rhodin <helge.rhodin at alice-dsl.net> writes:
>
> >> But I
2010 Aug 10
4
[LLVMdev] PTX backend, BSD license
Helge Rhodin <helge.rhodin at alice-dsl.net> writes:
>> But I didn't study their code thoroughly, so I might be wrong about this.
>>
> Yes, we don't use the target-independent code generator and the
> backend is based on the CBackend. We decided to not use the code
> generator because PTX code is also an intermediate language. The
> graphics driver
2013 Jul 18
2
question about Makeconf and nvcc/CUDA
Dear R development:
I'm not sure if this is the appropriate list, but it's a start.
I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory.
But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE.
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
Hi,
I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck.
Specifically, below is what I did:
1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu;
2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated
2017 Nov 14
1
OrcJIT + CUDA Prototype for Cling
Hi Lang,
thank You very much. I've used Your code and the creating of the object
file works. I think the problem is after creating the object file. When
I link the object file with ld I get an executable, which is working right.
After changing the clang and llvm libraries from the package control
version (.deb) to a own compiled version with debug options, I get an
assert() fault.
In
void
2010 Mar 27
2
[LLVMdev] PTX target for LLVM?
Hi
I am interested to know: are there are any LLVM targets in the works
for Nvidia's PTX ISA?
Also if anyone knows about Ocelot (a project done by some students at
my school): it does the opposite of what I am trying to do (translates
PTX to LLVM IR to run Cuda kernels on the CPU).
Thanks in advance.
-Puyan
2011 Sep 03
2
[LLVMdev] PTX optimizations
Hi everyone,
I am trying to add some optimizations to LLVM's PTX backend. But i am
unaware of the existing optimizations. Can you please guide me about the
same?
Thank You:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110903/bc038a07/attachment.html>
2018 Jun 21
2
NVPTX - Reordering load instructions
Hi all,
I'm looking into the performance difference of a benchmark compiled with
NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a
significant difference due to PTX instruction ordering. The relevant
source code consists of two nested loops that get fully unrolled, doing
some basic arithmetic with values loaded from shared memory:
> #define BLOCK_SIZE 16
>
>