Displaying 20 results from an estimated 32 matches for "ptx_kernel".
2017 Jun 22
2
Legal names for Functions and other Identifiers
Thanks for the heads up Philip !
I did come across a strange case where LLVM allowed "%" to be a part of a
function's name. This was in the context of my patch
https://reviews.llvm.org/D33985, where I prefix the name of the source
function and the Scop ( A special kind of Region that Polly can optimize,
the name of the Scop is the name of the Region ) to the name of the PTX
kernel
2012 May 15
1
[LLVMdev] NVPTX: why ret instruction is not translated to exit in kernel function?
Hi Justin,
In the PTX backend, "ret" instruction at the end of a ptx_kernel function
is translated to "exit" instruction. A test case named exit.ll demos this.
But in the NVPTX backend, it seems that you didn't do such a translation.
Why do you choose this? Is this due to the changes of the NVIDA PTX itself?
Thanks,
Yabin
-------------- next part ---------...
2012 May 07
0
[LLVMdev] NVPTX annotation metadata emission
This new metadata format is currently optional. The old ptx_kernel calling convention should still work.
The only thing you should have to change when converting from PTX -> NVPTX is the address space map. The calling conventions and intrinsics should be compatible with both.
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:l...
2012 May 07
2
[LLVMdev] NVPTX annotation metadata emission
Hi everybody,
I have noticed that the new NVPTX backend requires new metadata to identify
the kernels in the module:
define void @metadata_kernel(float* %a) {
ret void
}
!nvvm.annotations = !{!1}
!1 = metadata !{void (float*)* @metadata_kernel, metadata !"kernel", i32 1}
Is clang going to support the emission of this metadata soon ? Or do I have
to write it on my own ? :)
Thanks,
2012 Jul 26
0
[LLVMdev] [PROPOSAL] LLVM multi-module support
...-----------------------------------------------------------------------
> target datalayout = ...
> target triple = "x86_64-unknown-linux-gnu"
>
> @llvm_kernel = private unnamed_addr constant llvm_kernel {
> target triple = nvptx64-unknown-unknown
> define internal ptx_kernel void @gpu_kernel(i8* %Array) {
> ...
> }
> }
> ------------------------------------------------------------------------
>
> By default the global will be compiled to a llvm string stored in the
> object file. We could also think about translating it to PTX or AMD's...
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
the non-translatable IR with GPU code replaced by appropriate CUDA Driver
> API calls.
One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So
if I want to provide a one-touch solution and don't introduce any changes
to tools outside polly, I must prepare the ptx string before I can generate
the correct non-translatable IR part.
As your suggestion, It may
2012 Jul 26
6
[LLVMdev] [PROPOSAL] LLVM multi-module support
...dules
as global variables.
------------------------------------------------------------------------
target datalayout = ...
target triple = "x86_64-unknown-linux-gnu"
@llvm_kernel = private unnamed_addr constant llvm_kernel {
target triple = nvptx64-unknown-unknown
define internal ptx_kernel void @gpu_kernel(i8* %Array) {
...
}
}
------------------------------------------------------------------------
By default the global will be compiled to a llvm string stored in the
object file. We could also think about translating it to PTX or AMD's
HSA-IL, such that e.g. PTX can b...
2011 Oct 13
3
[LLVMdev] RFC: Representation of OpenCL Memory Spaces
...g in the following kernel:
__kernel
void foo(__global float* a, __local float* b) {
b[0] = a[0];
}
If we compile this with Clang targeting PTX, the resulting LLVM IR will be:
target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx32--"
define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
nocapture %b) nounwind noinline {
entry:
%0 = load float* %a, align 4, !tbaa !1
store float %0, float addrspace(4)* %b, align 4, !tbaa !1
ret void
}
!opencl.kernels = !{!0}
!0 = metadata !{void (float*, float addrspace(4)*)* @foo}
!1 = meta...
2012 May 16
2
[LLVMdev] NVPTX: __iAtomicCAS support ?
...;
target triple = "ptx64-unknown-unknown"
%struct.kernelgen_callback_t = type { i32, i32,
%"struct.kernelgen::kernel_t"*, i32, i32,
%struct.kernelgen_callback_data_t* }
%"struct.kernelgen::kernel_t" = type opaque
%struct.kernelgen_callback_data_t = type opaque
define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) nounwind {
entry:
%callback.addr = alloca i32*, align 8
store i32* %callback, i32** %callback.addr, align 8
%0 = load i32** %callback.addr, align 8
%1 = bitcast i32* %0 to %struct.kernelgen_callback_t*
%lock = getelementptr inbounds %struct.ke...
2012 Apr 03
2
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
2012/4/3 Justin Holewinski <justin.holewinski at gmail.com>
> *Motivation*
>> With the broad proliferation of GPU computing, it is very important to
>> provide an easy and automatic tool to develop or port the applications to
>> GPU for normal developers, especially for those domain experts who want to
>> harness the huge computing power of GPU. Polly
2012 Apr 02
6
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
...t various parallel loops which can be described by
Polly's polyhedral model. We first translated the selected SCoPs (Static
Control Parts) into 4-depth loops with Polly's schedule optimization. Then
we extract the loop body (or inner non-parallel loops) into a LLVM
sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
that, we use PTX backend to translate the subfunctions into a string of the
corresponding PTX codes. Finally, we provide an runtime library to generate
the executable program.
There are three key challenges in this project here.
1. How to get the optimal execut...
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
...39;
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"
@__kernelgen_version = constant [15 x i8] c"0.2/1654:1675M\00"
define ptx_kernel void @__kernelgen_matvec_loop_7(i32* nocapture) #0 {
"Loop Function Root":
%tid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%ctaid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9
%BlockLB.Add.ThreadPo...
2013 Jan 31
0
[LLVMdev] intptr_t support in llvm
...tion ptrtoint). Or is there better way to
make sure that the semantics of intprt_t are preserved in clang/llvm for
all archs ?
The code I tried to compile is
__kernel void intptr_t_kernel(int *a, int *b)
{
intptr_t c = (intptr_t)a;
b = (int*) c;
*a += b;
}
and the code generated is
define ptx_kernel void @intptr_t_kernel(i32* %a, i32* nocapture %b)
nounwind noinline {
entry:
%0 = load i32* %a, align 4, !tbaa !1
%add.ptr = getelementptr inbounds i32* %a, i32 %0
%conv = ptrtoint i32* %add.ptr to i32
store i32 %conv, i32* %a, align 4, !tbaa !1
ret void
}
Thanks a lot for help.
Regards...
2012 May 16
0
[LLVMdev] NVPTX: __iAtomicCAS support ?
...known"
>
> %struct.kernelgen_callback_t = type { i32, i32,
> %"struct.kernelgen::kernel_t"*, i32, i32,
> %struct.kernelgen_callback_data_t* }
> %"struct.kernelgen::kernel_t" = type opaque
> %struct.kernelgen_callback_data_t = type opaque
>
> define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback)
> nounwind {
> entry:
> %callback.addr = alloca i32*, align 8
> store i32* %callback, i32** %callback.addr, align 8
> %0 = load i32** %callback.addr, align 8
> %1 = bitcast i32* %0 to %struct.kernelgen_callback_t*
> %lock...
2013 Jan 22
1
[LLVMdev] Compiling to NVPTX
I'm in the process of writing a library and giving a talk about writing
compilers using LLVM (llvm-c) and Clojure. As part of my talk I'd like to
give an example of a program running on CUDA.
Are there any papers, tutorials, examples, on writing a custom frontend for
NVPTX? For instance, I'm trying to figure out how to get access to "global"
variables like blockidx. I know
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all,
Attached notunrolled.ll is a module containing reduction kernel. What I'm
trying to do is to unroll it in such way, that partial reduction on
unrolled iterations would be performed on register, and then stored to
memory only once. Currently llvm's unroller together with all standard
optimizations produce code, which stores value to memory after every
unrolled iteration, which is
2015 Jan 28
3
[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?
...- undocumented calling conventions
The following calling conventions are valid tokens but not described in
the language references as of revision 223189:
intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc,
kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc,
msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func,
x86_64_sysvcc, x86_64_win64cc, kw_ghccc
Lastly I'd just like to thank the LLVM developers for all the time and
hard work they've put into this project. I'd especially like to thank
you for providing a language specification along side of the re...
2011 Oct 13
4
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...in essence different
> types and cannot alias. For the kernel shown above, the resulting LLVM IR
> could be:
>
> ; ModuleID = 'test1.cl'
> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> target triple = "ptx32--"
>
> define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> nocapture %b) nounwind noinline {
> entry:
> %0 = load float* %a, align 4, !tbaa !1
> store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> ret void
> }
>
> !opencl.kernels = !{!0}
>
> !0 = metadata !...
2012 Jul 26
0
[LLVMdev] [PROPOSAL] LLVM multi-module support
...---------------------------------------------------------------------
> target datalayout = ...
> target triple = "x86_64-unknown-linux-gnu"
>
> @llvm_kernel = private unnamed_addr constant llvm_kernel {
> target triple = nvptx64-unknown-unknown
> define internal ptx_kernel void @gpu_kernel(i8* %Array) {
> ...
> }
> }
> ------------------------------------------------------------------------
>
> By default the global will be compiled to a llvm string stored in the
> object file. We could also think about translating it to PTX or AMD'...
2011 Oct 13
1
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...and cannot alias. For the kernel shown above, the resulting LLVM IR
> > could be:
> >
> > ; ModuleID = 'test1.cl'
> > target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
> > target triple = "ptx32--"
> >
> > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*
> > nocapture %b) nounwind noinline {
> > entry:
> > %0 = load float* %a, align 4, !tbaa !1
> > store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*
> > ret void
> > }
> >
> > !opencl.ker...