thr3ads.net - search: "ptx

Displaying 20 results from an estimated 32 matches for "ptx_kernel".

Legal names for Functions and other Identifiers

2017 Jun 22

Legal names for Functions and other Identifiers

Thanks for the heads up Philip ! I did come across a strange case where LLVM allowed "%" to be a part of a function's name. This was in the context of my patch https://reviews.llvm.org/D33985, where I prefix the name of the source function and the Scop ( A special kind of Region that Polly can optimize, the name of the Scop is the name of the Region ) to the name of the PTX kernel

[LLVMdev] NVPTX: why ret instruction is not translated to exit in kernel function?

2012 May 15

[LLVMdev] NVPTX: why ret instruction is not translated to exit in kernel function?

Hi Justin, In the PTX backend, "ret" instruction at the end of a ptx_kernel function is translated to "exit" instruction. A test case named exit.ll demos this. But in the NVPTX backend, it seems that you didn't do such a translation. Why do you choose this? Is this due to the changes of the NVIDA PTX itself? Thanks, Yabin -------------- next part ---------...

[LLVMdev] NVPTX annotation metadata emission

2012 May 07

[LLVMdev] NVPTX annotation metadata emission

This new metadata format is currently optional. The old ptx_kernel calling convention should still work. The only thing you should have to change when converting from PTX -> NVPTX is the address space map. The calling conventions and intrinsics should be compatible with both. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:l...

[LLVMdev] NVPTX annotation metadata emission

2012 May 07

[LLVMdev] NVPTX annotation metadata emission

Hi everybody, I have noticed that the new NVPTX backend requires new metadata to identify the kernels in the module: define void @metadata_kernel(float* %a) { ret void } !nvvm.annotations = !{!1} !1 = metadata !{void (float*)* @metadata_kernel, metadata !"kernel", i32 1} Is clang going to support the emission of this metadata soon ? Or do I have to write it on my own ? :) Thanks,

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

...----------------------------------------------------------------------- > target datalayout = ... > target triple = "x86_64-unknown-linux-gnu" > > @llvm_kernel = private unnamed_addr constant llvm_kernel { > target triple = nvptx64-unknown-unknown > define internal ptx_kernel void @gpu_kernel(i8* %Array) { > ... > } > } > ------------------------------------------------------------------------ > > By default the global will be compiled to a llvm string stored in the > object file. We could also think about translating it to PTX or AMD's...

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, the non-translatable IR with GPU code replaced by appropriate CUDA Driver > API calls. One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So if I want to provide a one-touch solution and don't introduce any changes to tools outside polly, I must prepare the ptx string before I can generate the correct non-translatable IR part. As your suggestion, It may

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

...dules as global variables. ------------------------------------------------------------------------ target datalayout = ... target triple = "x86_64-unknown-linux-gnu" @llvm_kernel = private unnamed_addr constant llvm_kernel { target triple = nvptx64-unknown-unknown define internal ptx_kernel void @gpu_kernel(i8* %Array) { ... } } ------------------------------------------------------------------------ By default the global will be compiled to a llvm string stored in the object file. We could also think about translating it to PTX or AMD's HSA-IL, such that e.g. PTX can b...

[LLVMdev] RFC: Representation of OpenCL Memory Spaces

2011 Oct 13

[LLVMdev] RFC: Representation of OpenCL Memory Spaces

...g in the following kernel: __kernel void foo(__global float* a, __local float* b) { b[0] = a[0]; } If we compile this with Clang targeting PTX, the resulting LLVM IR will be: target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64" target triple = "ptx32--" define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)* nocapture %b) nounwind noinline { entry: %0 = load float* %a, align 4, !tbaa !1 store float %0, float addrspace(4)* %b, align 4, !tbaa !1 ret void } !opencl.kernels = !{!0} !0 = metadata !{void (float*, float addrspace(4)*)* @foo} !1 = meta...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...; target triple = "ptx64-unknown-unknown" %struct.kernelgen_callback_t = type { i32, i32, %"struct.kernelgen::kernel_t"*, i32, i32, %struct.kernelgen_callback_data_t* } %"struct.kernelgen::kernel_t" = type opaque %struct.kernelgen_callback_data_t = type opaque define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) nounwind { entry: %callback.addr = alloca i32*, align 8 store i32* %callback, i32** %callback.addr, align 8 %0 = load i32** %callback.addr, align 8 %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* %lock = getelementptr inbounds %struct.ke...

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, 2012/4/3 Justin Holewinski <justin.holewinski at gmail.com> > *Motivation* >> With the broad proliferation of GPU computing, it is very important to >> provide an easy and automatic tool to develop or port the applications to >> GPU for normal developers, especially for those domain experts who want to >> harness the huge computing power of GPU. Polly

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 02

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

...t various parallel loops which can be described by Polly's polyhedral model. We first translated the selected SCoPs (Static Control Parts) into 4-depth loops with Polly's schedule optimization. Then we extract the loop body (or inner non-parallel loops) into a LLVM sub-function, tagged with PTX_Kernel or PTX_Device call convention. After that, we use PTX backend to translate the subfunctions into a string of the corresponding PTX codes. Finally, we provide an runtime library to generate the executable program. There are three key challenges in this project here. 1. How to get the optimal execut...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

...39; target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx64-unknown-unknown" @__kernelgen_version = constant [15 x i8] c"0.2/1654:1675M\00" define ptx_kernel void @__kernelgen_matvec_loop_7(i32* nocapture) #0 { "Loop Function Root": %tid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.tid.x() %ctaid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9 %BlockLB.Add.ThreadPo...

[LLVMdev] intptr_t support in llvm

2013 Jan 31

[LLVMdev] intptr_t support in llvm

...tion ptrtoint). Or is there better way to make sure that the semantics of intprt_t are preserved in clang/llvm for all archs ? The code I tried to compile is __kernel void intptr_t_kernel(int *a, int *b) { intptr_t c = (intptr_t)a; b = (int*) c; *a += b; } and the code generated is define ptx_kernel void @intptr_t_kernel(i32* %a, i32* nocapture %b) nounwind noinline { entry: %0 = load i32* %a, align 4, !tbaa !1 %add.ptr = getelementptr inbounds i32* %a, i32 %0 %conv = ptrtoint i32* %add.ptr to i32 store i32 %conv, i32* %a, align 4, !tbaa !1 ret void } Thanks a lot for help. Regards...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...known" > > %struct.kernelgen_callback_t = type { i32, i32, > %"struct.kernelgen::kernel_t"*, i32, i32, > %struct.kernelgen_callback_data_t* } > %"struct.kernelgen::kernel_t" = type opaque > %struct.kernelgen_callback_data_t = type opaque > > define ptx_kernel void @_Z17kernelgen_monitorPi(i32* %callback) > nounwind { > entry: > %callback.addr = alloca i32*, align 8 > store i32* %callback, i32** %callback.addr, align 8 > %0 = load i32** %callback.addr, align 8 > %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* > %lock...

[LLVMdev] Compiling to NVPTX

2013 Jan 22

[LLVMdev] Compiling to NVPTX

I'm in the process of writing a library and giving a talk about writing compilers using LLVM (llvm-c) and Clojure. As part of my talk I'd like to give an example of a program running on CUDA. Are there any papers, tutorials, examples, on writing a custom frontend for NVPTX? For instance, I'm trying to figure out how to get access to "global" variables like blockidx. I know

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?

2015 Jan 28

[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?

...- undocumented calling conventions The following calling conventions are valid tokens but not described in the language references as of revision 223189: intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, x86_64_sysvcc, x86_64_win64cc, kw_ghccc Lastly I'd just like to thank the LLVM developers for all the time and hard work they've put into this project. I'd especially like to thank you for providing a language specification along side of the re...

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

2011 Oct 13

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

...in essence different > types and cannot alias. For the kernel shown above, the resulting LLVM IR > could be: > > ; ModuleID = 'test1.cl' > target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64" > target triple = "ptx32--" > > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)* > nocapture %b) nounwind noinline { > entry: > %0 = load float* %a, align 4, !tbaa !1 > store float %0, float addrspace(4)* %b, align 4, !tbaa *!2* > ret void > } > > !opencl.kernels = !{!0} > > !0 = metadata !...

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

...--------------------------------------------------------------------- > target datalayout = ... > target triple = "x86_64-unknown-linux-gnu" > > @llvm_kernel = private unnamed_addr constant llvm_kernel { > target triple = nvptx64-unknown-unknown > define internal ptx_kernel void @gpu_kernel(i8* %Array) { > ... > } > } > ------------------------------------------------------------------------ > > By default the global will be compiled to a llvm string stored in the > object file. We could also think about translating it to PTX or AMD'...

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

2011 Oct 13

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

...and cannot alias. For the kernel shown above, the resulting LLVM IR > > could be: > > > > ; ModuleID = 'test1.cl' > > target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64" > > target triple = "ptx32--" > > > > define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)* > > nocapture %b) nounwind noinline { > > entry: > > %0 = load float* %a, align 4, !tbaa !1 > > store float %0, float addrspace(4)* %b, align 4, !tbaa *!2* > > ret void > > } > > > > !opencl.ker...

search for: ptx_kernel