Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] [PROPOSAL] LLVM multi-module support"
2012 Jul 26
0
[LLVMdev] [PROPOSAL] LLVM multi-module support
Hi Tobias, I didn't really get it. Is the idea that the same bitcode is
going to be codegen'd for different architectures, or is each sub-module
going to contain different bitcode? In the later case you may as well
just use multiple modules, perhaps in conjunction with a scheme to store
more than one module in the same file on disk as a convenience.
Ciao, Duncan.
> a couple of weeks
2012 Jul 26
7
[LLVMdev] [PROPOSAL] LLVM multi-module support
In our project we combine regular binary code and LLVM IR code for kernels,
embedded as a special data symbol of ELF object. The LLVM IR for kernel
existing at compile-time is preliminary, and may be optimized further
during runtime (pointers analysis, polly, etc.). During application
startup, runtime system builds an index of all kernels sources embedded
into the executable. Host and kernel code
2012 Jul 26
0
[LLVMdev] [PROPOSAL] LLVM multi-module support
Tobias Grosser <tobias at grosser.es> writes:
> o Modeling sub-architectures on a per-function basis
>
> Functions could be specialized for a certain sub-architecture. This is
> helpful to have certain functions optimized e.g. with AVX2 enabled, but
> the general program being compiled for a more generic architecture.
> We do not address per-function annotations in this
2012 Jul 26
0
[LLVMdev] [PROPOSAL] LLVM multi-module support
I'm not convinced that having multi-module IR files is the way to go. It
just seems like a lot of infrastructure/design work for little gain. Can
the embedded modules have embedded modules themselves? How deep can this
go? If not, then the embedded LLVM IR language is really a subset of the
full LLVM IR language. How do you share variables between parent and
embedded modules?
I feel that
2012 Apr 02
6
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi all,
I am a phd student from Huazhong University of Sci&Tech, China. The
following is my GSoC 2012 proposal.
Comments are welcome!
*Title: Automatic GPGPU Code Generation for LLVM*
*Abstract*
Very often, manually developing an GPGPU application is a time-consuming,
complex, error-prone and iterative process. In this project, I propose to
build an automatic GPGPU code generation framework
2012 Apr 04
3
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> Hi Yabin,
>
> Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> can also the improve llc/lli or create new tools to support the code
> generation for Heterogeneous platforms[1], i.e. generate code for more
> than one target architecture at the same time. Something like this is
> not very complicated and had
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all,
Attached notunrolled.ll is a module containing reduction kernel. What I'm
trying to do is to unroll it in such way, that partial reduction on
unrolled iterations would be performed on register, and then stored to
memory only once. Currently llvm's unroller together with all standard
optimizations produce code, which stores value to memory after every
unrolled iteration, which is
2017 Jun 22
2
Legal names for Functions and other Identifiers
Thanks for the heads up Philip !
I did come across a strange case where LLVM allowed "%" to be a part of a
function's name. This was in the context of my patch
https://reviews.llvm.org/D33985, where I prefix the name of the source
function and the Scop ( A special kind of Region that Polly can optimize,
the name of the Scop is the name of the Region ) to the name of the PTX
kernel
2012 Apr 04
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote:
> On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> > Hi Yabin,
> >
> > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> > can also the improve llc/lli or create new tools to support the code
> > generation for Heterogeneous platforms[1], i.e. generate code for
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does
not seem to help: alias analysis still considers arrays as may-alias, which
most likely prevents the desired optimization. Below is the sample code
with TBAA metadata inserted. Could you please suggest what might be wrong
with it?
Many thanks,
- D.
marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2012 Apr 03
2
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
2012/4/3 Justin Holewinski <justin.holewinski at gmail.com>
> *Motivation*
>> With the broad proliferation of GPU computing, it is very important to
>> provide an easy and automatic tool to develop or port the applications to
>> GPU for normal developers, especially for those domain experts who want to
>> harness the huge computing power of GPU. Polly
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Yabin,
Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
can also the improve llc/lli or create new tools to support the code
generation for Heterogeneous platforms[1], i.e. generate code for more
than one target architecture at the same time. Something like this is
not very complicated and had been implemented[2,3] by some people, but
not available in LLVM mainstream.
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
On Mon, Apr 2, 2012 at 7:16 AM, Yabin Hu <yabin.hwu at gmail.com> wrote:
> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
> Comments are welcome!
>
> *Title: Automatic GPGPU Code Generation for LLVM*
>
> *Abstract*
> Very often, manually developing an GPGPU application is a
2012 May 07
2
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Tobias Grosser <tobias at grosser.es> writes:
>> Doesn't LLVM support taking the address of a function in another address
>> space? If not it probably should.
>
> Hi Dave,
>
> I highly appreciate your idea of integrating heterogeneous computing
> features directly into LLVM-IR. I believe this can be a way worth
> going, but I doubt now is the right moment
2020 Sep 24
2
cuda __shfl_sync problem
Hi,
First of all, i'm not sure if i should be posting this here or in
cfe-dev, but here it goes.
In order to instrument CUDA kernels i first generate device IR with:
clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o
device.bc
I also have a library that contains the instrumentation stubs for which
i generate IR similarly and i link it with the device IR
2012 Apr 03
0
[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hi Justin,
the non-translatable IR with GPU code replaced by appropriate CUDA Driver
> API calls.
One of CUDA driver apis (cuLaunch) need a ptx asm string as its input. So
if I want to provide a one-touch solution and don't introduce any changes
to tools outside polly, I must prepare the ptx string before I can generate
the correct non-translatable IR part.
As your suggestion, It may
2020 Sep 25
2
cuda __shfl_sync problem
Do you mean in llc? Because i don't see such an option i'm afraid.
~George
On 24-09-2020 20:54, Johannes Doerfert wrote:
> Not that I am an expert but it looks like it defaults to the minimal
> PTX version that supports the compute capability. You might be able to
> choose PTX 6.0 though.
>
> ~ Johannes
>
>
> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
2013 Jan 22
1
[LLVMdev] Compiling to NVPTX
I'm in the process of writing a library and giving a talk about writing
compilers using LLVM (llvm-c) and Clojure. As part of my talk I'd like to
give an example of a program running on CUDA.
Are there any papers, tutorials, examples, on writing a custom frontend for
NVPTX? For instance, I'm trying to figure out how to get access to "global"
variables like blockidx. I know
2012 Jun 12
2
[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it
Dear LLVM NVPTX maintainers,
Just to have the issue recorded, I don't know how important it is:
clang generates linkonce_odr out of __inline__, and NVPTX generates .weak
out of linkonce_odr (how it happens - a big question, btw, because I can't
find anything related in NVPTX asm printer - does it chain to some other
printer?), and finally ptxas (both 4.2 and 5) fails to compile it to
2011 Oct 13
2
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
On Thu, Oct 13, 2011 at 06:59:47PM +0000, Villmow, Micah wrote:
> Justin,
> Out of these options, I would take the metadata approach for AA support.
>
> This doesn't solve the problem of different frontend/backends choosing different
> address space representations for the same language, but is the correct
> approach for providing extra information to the optimizations.