thr3ads.net - similar to: "[LLVMdev] [PROPOSAL] LLVM multi-module support"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] [PROPOSAL] LLVM multi-module support"

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

Hi Tobias, I didn't really get it. Is the idea that the same bitcode is going to be codegen'd for different architectures, or is each sub-module going to contain different bitcode? In the later case you may as well just use multiple modules, perhaps in conjunction with a scheme to store more than one module in the same file on disk as a convenience. Ciao, Duncan. > a couple of weeks

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

In our project we combine regular binary code and LLVM IR code for kernels, embedded as a special data symbol of ELF object. The LLVM IR for kernel existing at compile-time is preliminary, and may be optimized further during runtime (pointers analysis, polly, etc.). During application startup, runtime system builds an index of all kernels sources embedded into the executable. Host and kernel code

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

Tobias Grosser <tobias at grosser.es> writes: > o Modeling sub-architectures on a per-function basis > > Functions could be specialized for a certain sub-architecture. This is > helpful to have certain functions optimized e.g. with AVX2 enabled, but > the general program being compiled for a more generic architecture. > We do not address per-function annotations in this

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

I'm not convinced that having multi-module IR files is the way to go. It just seems like a lot of infrastructure/design work for little gain. Can the embedded modules have embedded modules themselves? How deep can this go? If not, then the embedded LLVM IR language is really a subset of the full LLVM IR language. How do you share variables between parent and embedded modules? I feel that

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 02

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi all, I am a phd student from Huazhong University of Sci&Tech, China. The following is my GSoC 2012 proposal. Comments are welcome! *Title: Automatic GPGPU Code Generation for LLVM* *Abstract* Very often, manually developing an GPGPU application is a time-consuming, complex, error-prone and iterative process. In this project, I propose to build an automatic GPGPU code generation framework

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 04

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > Hi Yabin, > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > can also the improve llc/lli or create new tools to support the code > generation for Heterogeneous platforms[1], i.e. generate code for more > than one target architecture at the same time. Something like this is > not very complicated and had

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

Legal names for Functions and other Identifiers

2017 Jun 22

Legal names for Functions and other Identifiers

Thanks for the heads up Philip ! I did come across a strange case where LLVM allowed "%" to be a part of a function's name. This was in the context of my patch https://reviews.llvm.org/D33985, where I prefix the name of the source function and the Scop ( A special kind of Region that Polly can optimize, the name of the Scop is the name of the Region ) to the name of the PTX kernel

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 04

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote: > On 04/03/2012 03:13 PM, Hongbin Zheng wrote: > > Hi Yabin, > > > > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you > > can also the improve llc/lli or create new tools to support the code > > generation for Heterogeneous platforms[1], i.e. generate code for

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, 2012/4/3 Justin Holewinski <justin.holewinski at gmail.com> > *Motivation* >> With the broad proliferation of GPU computing, it is very important to >> provide an easy and automatic tool to develop or port the applications to >> GPU for normal developers, especially for those domain experts who want to >> harness the huge computing power of GPU. Polly

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Yabin, Instead of compile the LLVM IR to PTX asm string in a ScopPass, you can also the improve llc/lli or create new tools to support the code generation for Heterogeneous platforms[1], i.e. generate code for more than one target architecture at the same time. Something like this is not very complicated and had been implemented[2,3] by some people, but not available in LLVM mainstream.

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote: > Hi all, > > I am a phd student from Huazhong University of Sci&Tech, China. The > following is my GSoC 2012 proposal. > Comments are welcome! > > *Title: Automatic GPGPU Code Generation for LLVM* > > *Abstract* > Very often, manually developing an GPGPU application is a

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

2012 May 07

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Tobias Grosser <tobias at grosser.es> writes: >> Doesn't LLVM support taking the address of a function in another address >> space? If not it probably should. > > Hi Dave, > > I highly appreciate your idea of integrating heterogeneous computing > features directly into LLVM-IR. I believe this can be a way worth > going, but I doubt now is the right moment

cuda __shfl_sync problem

2020 Sep 24

cuda __shfl_sync problem

Hi, First of all, i'm not sure if i should be posting this here or in cfe-dev, but here it goes. In order to instrument CUDA kernels i first generate device IR with: clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

2012 Apr 03

[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Hi Justin, the non-translatable IR with GPU code replaced by appropriate CUDA Driver > API calls. One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So if I want to provide a one-touch solution and don't introduce any changes to tools outside polly, I must prepare the ptx string before I can generate the correct non-translatable IR part. As your suggestion, It may

cuda __shfl_sync problem

2020 Sep 25

cuda __shfl_sync problem

Do you mean in llc? Because i don't see such an option i'm afraid. ~George On 24-09-2020 20:54, Johannes Doerfert wrote: > Not that I am an expert but it looks like it defaults to the minimal > PTX version that supports the compute capability. You might be able to > choose PTX 6.0 though. > > ~ Johannes > > > On 9/24/20 1:02 PM, George K via llvm-dev wrote:

[LLVMdev] Compiling to NVPTX

2013 Jan 22

[LLVMdev] Compiling to NVPTX

I'm in the process of writing a library and giving a talk about writing compilers using LLVM (llvm-c) and Clojure. As part of my talk I'd like to give an example of a program running on CUDA. Are there any papers, tutorials, examples, on writing a custom frontend for NVPTX? For instance, I'm trying to figure out how to get access to "global" variables like blockidx. I know

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 12

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

Dear LLVM NVPTX maintainers, Just to have the issue recorded, I don't know how important it is: clang generates linkonce_odr out of __inline__, and NVPTX generates .weak out of linkonce_odr (how it happens - a big question, btw, because I can't find anything related in NVPTX asm printer - does it chain to some other printer?), and finally ptxas (both 4.2 and 5) fails to compile it to

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

2011 Oct 13

[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

On Thu, Oct 13, 2011 at 06:59:47PM +0000, Villmow, Micah wrote: > Justin, > Out of these options, I would take the metadata approach for AA support. > > This doesn't solve the problem of different frontend/backends choosing different > address space representations for the same language, but is the correct > approach for providing extra information to the optimizations.

similar to: [LLVMdev] [PROPOSAL] LLVM multi-module support