Displaying 20 results from an estimated 400 matches similar to: "Missing TargetPrefix for NVVM intrinsics"
2015 Jan 12
3
[LLVMdev] Emitting IR in older formats (for NVVM)
This question is specifically motivated by the practical constraints of
NVVM, but I don't know anywhere better to ask (hopefully, e.g.,
@jholewinski is still following), and I believe it concerns general LLVM
issues:
NVIDIA's libNVVM is built on LLVM 3.2. This means its bitcode and LL text
parsers are from that generation. It's interface calls for adding modules
as either bitcode
2015 Jan 13
2
[LLVMdev] Emitting IR in older formats (for NVVM)
Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot.
Thank you Tobias.
-MH
On January 12, 2015, Tobias Grosser <tgrosser at inf.ethz.ch> wrote:
> On 12.01.2015 05:48, Jonathan Ragan-Kelley wrote:
> > This question is specifically motivated by the practical constraints of
> > NVVM, but I don't know anywhere better to ask (hopefully, e.g.,
2015 Jan 13
2
[LLVMdev] Emitting IR in older formats (for NVVM)
Thanks, all.
I didn’t realize a 7.0 RC was public and changed to 3.4—I will go down that road for now, though I’ll probably also look into integrating variants of the SPIR converter in the future.
Another possibility is to skip libnvvm altogether and use LLVM's NVPTX target. This is of course harder since you have to configure the passes yourself instead of just calling a few C
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
I have written test.ll as below and ran 'opt' on it as
" opt -std-compile-opts test.ll -S -o -" . But the output shows that there
is code motion around the barrier intrinsics.
test.ll
-------
; ModuleID = 'test.bc'
define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) {
entry:
%T_0 = load volatile i16* %I_0
%T_1 = load volatile i16* %I_1
%T_2 =
2012 May 01
2
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
> -----Original Message-----
> From: Dan Bailey [mailto:dan at dneg.com]
> Sent: Sunday, April 29, 2012 8:46 AM
> To: Justin Holewinski
> Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover;
> llvmdev at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
>
> Justin,
>
> Firstly, this is great! It seems to be so much further forward in
2012 May 02
0
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Justin Holewinski wrote:
<blockquote
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
is there any guarantee that the nvptx intrinsic "llvm.nvvm.barrier0" will
not be moved around by opt ?
In other words, can I expect all the instructions above
"llvm.nvvm.barrier0" to remain above it and those below it to remain below,
after all the opt passes are run ?
If that is not the case, is there a way to define such an intrinsic ?
Thanks.
-------------- next part
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue,
Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect
anchor didn't go away; ptxas is still complaining about the duplicate
definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused
the nvvm-reflect pass?
Thanks!
yuanfeng
On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:
> According to the examples you
2016 Mar 13
2
instrumenting device code with gpucc
Hey Jingyue,
Thanks for being so responsive! I finally figured out a way to resolve the
issue: all I have to do is to use `-only-needed` when merging the device
bitcodes with llvm-link.
However, since we actually need to instrument the host code as well, I
encountered another issue when I tried to glue the instrumented host code
and fatbin together. When I only instrumented the device code, I
2016 Mar 15
2
instrumenting device code with gpucc
Hi Jingyue,
Sorry to ask again, but how exactly could I glue the fatbin with the
instrumented host code? Or does it mean we actually cannot instrument both
the host & device code at the same time?
Thanks!
yuanfeng
On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote:
> Including fatbin into host code should be done in frontend.
>
> On Mon, Mar 14, 2016
2015 Apr 08
5
[LLVMdev] CUDA front-end (CUDA to LLVM IR)
Hi,
I wanted to ask whether there is ongoing effort (or an already established
tool) that enables to convert CUDA kernels (that uses CUDA specific
intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware
that I can do this for OpenCL with the help of libclc but I can not find
something similar for CUDA.
Thanks
-------------- next part --------------
An HTML attachment was
2020 Sep 23
2
Information about the number of indices in memory accesses
Hi all,
For loads and stores i want to extract information about the number of
indices accessed. For instance:
struct S {int X, int *Y};
__global__ void kernel(int *A, int **B, struct S) {
int x = A[..][..]; // -> L: A[..][..]
int y = *B[2]; // -> L: B[0][2]
int z = S.y[..]; // -> L: S.1[..]
// etc..
}
I am performing some preprocessing on IR to:
1. Move constant
2013 Mar 01
1
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
The identifier INT_PTX_SREG_TID_X is the name of an instruction as the
back-end sees it, and has very little to do with the name you should use in
your IR. Your best bet is to look at the include/llvm/IR/IntrinsicsNVVM.td
file and see the definitions for each intrinsic. Then, the name mapping is
just:
int_foo_bar -> llvm.foo.bar()
int_ prefix becomes llvm., and all underscores turn into
2016 Mar 10
4
instrumenting device code with gpucc
It's hard to tell what is wrong without a concrete example. E.g., what is
the program you are instrumenting? What is the definition of the hook
function? How did you link that definition with the binary?
One thing suspicious to me is that you may have linked the definition of
_Cool_MemRead_Hook as a host function instead of a device function. AFAIK,
PTX assembly cannot be linked. So, if you
2018 Feb 01
1
Intrinsic pattern matching
Hello,
I have a problem with pattern matching on intrinsics.
I have following code in IntrinsicsX86.td:
```
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_mpx_bndmk:
Intrinsic<[llvm_x86bnd_ty], [llvm_ptr_ty, llvm_i64_ty], []>;
}
```
And following instruction that is generated when @llvm.x86.mpx.bndmk is
used in code:
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all,
Attached notunrolled.ll is a module containing reduction kernel. What I'm
trying to do is to unroll it in such way, that partial reduction on
unrolled iterations would be performed on register, and then stored to
memory only once. Currently llvm's unroller together with all standard
optimizations produce code, which stores value to memory after every
unrolled iteration, which is
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hello,
FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324
When compiling the following code for sm_20, func params are by some reason
given with .align 0, which is invalid. Problem does not occur if compiled
for sm_10.
> cat test.ll
; ModuleID = '__kernelgen_main_module'
target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple =
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does
not seem to help: alias analysis still considers arrays as may-alias, which
most likely prevents the desired optimization. Below is the sample code
with TBAA metadata inserted. Could you please suggest what might be wrong
with it?
Many thanks,
- D.
marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2020 Oct 03
2
Information about the number of indices in memory accesses
Hi Ees,
SCEV Delinearization is the closest I know. But it has its problems. Well
for one your expression should be SCEVable.
But more importantly, SCEV Delinearization is trying to deduce something
that is high-level (actually source-level) from a low-level IR in which a
lot of this info has been lost. So, since there's not a 1-1 mapping from
high-level code to LLVM IR, going backwards will
2012 Apr 27
2
[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend
Thanks for the feedback!
The attached patch addresses the style issues that have been found.
From: Jim Grosbach [mailto:grosbach at apple.com]
Sent: Wednesday, April 25, 2012 2:22 PM
To: Justin Holewinski
Cc: llvm-commits at cs.uiuc.edu; llvmdev at cs.uiuc.edu; Vinod Grover
Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend
Hi Justin,
Cool stuff, to be sure. Excited to see this.
As a