thr3ads.net - llvm dev - [LLVMdev] [PROPOSAL] LLVM multi-module support [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Dmitry N. Mikushin

2012-Jul-26 10:42 UTC

[LLVMdev] [PROPOSAL] LLVM multi-module support

In our project we combine regular binary code and LLVM IR code for kernels,
embedded as a special data symbol of ELF object. The LLVM IR for kernel
existing at compile-time is preliminary, and may be optimized further
during runtime (pointers analysis, polly, etc.). During application
startup, runtime system builds an index of all kernels sources embedded
into the executable. Host and kernel code interact by means of special
"launch" call, which does not only optimize&compile&execute
the kernel, but
first makes an estimation if it is worth to, or better to fall back to host
code equivalent.

Proposal made by Tobias is very elegant, but it seems to be addressing the
case when host and sub-architectures' code exist in the same time. May I
kindly point out that to our experience the really efficient deeply
specialized sub-architectures code may simply not exist at compile time,
while the generic baseline host code always can.

Best,
- Dima.

2012/7/26 Duncan Sands <baldrick at free.fr>
> Hi Tobias, I didn't really get it.  Is the idea that the same bitcode
is
> going to be codegen'd for different architectures, or is each
sub-module
> going to contain different bitcode?  In the later case you may as well
> just use multiple modules, perhaps in conjunction with a scheme to store
> more than one module in the same file on disk as a convenience.
>
> Ciao, Duncan.
>
> > a couple of weeks ago I discussed with Peter how to improve LLVM's
> > support for heterogeneous computing. One weakness we (and others) have
> > seen is the absence of multi-module support in LLVM. Peter came up
with
> > a nice idea how to improve here. I would like to put this idea up for
> > discussion.
> >
> > ## The problem ##
> >
> > LLVM-IR modules can currently only contain code for a single target
> > architecture. However, there are multiple use cases where one
> > translation unit could contain code for several architectures.
> >
> > 1) CUDA
> >
> > cuda source files can contain both host and device code. The absence
of
> > multi-module support complicates adding CUDA support to clang, as
clang
> > would need to perform multi-module compilation on top of a
single-module
> > based compiler framework.
> >
> > 2) C++ AMP
> >
> > C++ AMP [1] contains - similarly to CUDA - both host code and device
> > code in the same source file. Even if C++ AMP is a Microsoft extension
> > the use case itself is relevant to clang. It would be great if LLVM
> > would provide infrastructure, such that front-ends could easily target
> > accelerators. This would probably yield a lot of interesting
experiments.
> >
> > 3) Optimizers
> >
> > To fully automatically offload computations to an accelerator an
> > optimization pass needs to extract the computation kernels and
schedule
> > them as separate kernels on the device. Such kernels are normally
> > LLVM-IR modules for different architectures. At the moment, passes
have
> > no way to create and store new LLVM-IR modules. There is also no way
> > to reference kernel LLVM-IR modules from a host module (which is
> > necessary to pass them to the accelerator run-time).
> >
> > ## Goals ##
> >
> > a) No major changes to existing tools and LLVM based applications
> >
> > b) Human readable and writable LLVM-IR
> >
> > c) FileCheck testability
> >
> > d) Do not force a specific execution model
> >
> > e) Unlimited number of embedded modules
> >
> > ## Detailed Goals
> >
> > a)
> >    o No changes should be required, if a tool does not use
multi-module
> >      support. Each LLVM-IR file valid today, should remain valid.
> >
> >    o Major tools should support basic heterogeneous modules without
large
> >      changes. Some of the commands that should work after smaller
> >      adaptions:
> >
> >      clang -S -emit-llvm -o out.ll
> >      opt -O3 out.ll -o out.opt.ll
> >      llc out.opt.ll
> >      lli out.opt.ll
> >      bugpoint -O3 out.opt.ll
> >
> > b) All (sub)modules should be directly human readable/writable.
> >      There should be no need to extract single modules before
modifying
> >      them.
> >
> > c) The LLVM-IR generated from a heterogeneous multi-module should
> >      easily be 'FileCheck'able. The same is true, if a
multi-module is
> >      the result of an optimization.
> >
> > d) In CUDA/OpenCL/C++ AMP kernels are scheduled from within the host
> >      code. This means arbitrary host code can decide under which
> >      conditions kernels are scheduled for execution. It is therefore
> >      necessary to reference individual sub-modules from within the
host
> >      module.
> >
> > e) CUDA/OpenCL allows to compile and schedule an arbitrary number of
> >      kernels. We do not want to put an artificial limit on the number
of
> >      modules they are represented in. This means a single embedded
> >      submodule is not enough.
> >
> > ## Non Goals ##
> >
> > o Modeling sub-architectures on a per-function basis
> >
> > Functions could be specialized for a certain sub-architecture. This is
> > helpful to have certain functions optimized e.g. with AVX2 enabled,
but
> > the general program being compiled for a more generic architecture.
> > We do not address per-function annotations in this proposal.
> >
> > ## Proposed solution ##
> >
> > To bring multi-module support to LLVM, we propose to add a new type
> > called 'llvmir' to LLVM-IR. It can be used to embed LLVM-IR
submodules
> > as global variables.
> >
> >
------------------------------------------------------------------------
> > target datalayout = ...
> > target triple = "x86_64-unknown-linux-gnu"
> >
> > @llvm_kernel = private unnamed_addr constant llvm_kernel {
> >     target triple = nvptx64-unknown-unknown
> >     define internal ptx_kernel void @gpu_kernel(i8* %Array) {
> >       ...
> >     }
> > }
> >
------------------------------------------------------------------------
> >
> > By default the global will be compiled to a llvm string stored in the
> > object file. We could also think about translating it to PTX or
AMD's
> > HSA-IL, such that e.g. PTX can be passed to a run-time library.
> >
> >   From my point of view, Peters idea allows us to add multi-module
> > support in a way that allows us to reach the goals described above.
> > However, to properly design and implement it, early feedback would be
> > valuable.
> >
> > Cheers
> > Tobi
> >
> > [1] http://msdn.microsoft.com/en-us/library/hh265137%28v=vs.110%29
> > [2]
> >
>
http://www.amd.com/us/press-releases/Pages/amd-arm-computing-innovation-2012june12.aspx
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120726/569cb614/attachment.html>

Justin Holewinski

2012-Jul-26 11:19 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

I'm not convinced that having multi-module IR files is the way to go.  It
just seems like a lot of infrastructure/design work for little gain.  Can
the embedded modules have embedded modules themselves?  How deep can this
go?  If not, then the embedded LLVM IR language is really a subset of the
full LLVM IR language.  How do you share variables between parent and
embedded modules?

I feel that this can be better solved by just using separate IR modules.
 For your purposes, the pass that generates the device code can simply
create a new module and the host code can refer to the generated code by
name.  Then, you can run each module through opt and llc individually, and
then link them together somehow, like Dmitry's use of ELF symbols/sections.
 This is exactly how CUDA binaries work; device code is embedded into the
host binary as special ELF sections.  This would be a bit more work on the
part of your toolchain to make sure opt and llc and executed for each
produced module, but the changes are far fewer than supporting sub-modules
in a single IR file.  This also has the benefit that you do not need to
change LLVM at all for this to work.

Is there some particular use-case that just won't work without sub-module
support?  I know you like using the example of "clang -o - | opt -o - |
llc" but I'm just not convinced that retaining the ability to pipe
tools
like that is justification enough to change such a fundamental part of the
LLVM system.

On Thu, Jul 26, 2012 at 6:42 AM, Dmitry N. Mikushin <maemarcus at
gmail.com>wrote:
> In our project we combine regular binary code and LLVM IR code for
> kernels, embedded as a special data symbol of ELF object. The LLVM IR for
> kernel existing at compile-time is preliminary, and may be optimized
> further during runtime (pointers analysis, polly, etc.). During application
> startup, runtime system builds an index of all kernels sources embedded
> into the executable. Host and kernel code interact by means of special
> "launch" call, which does not only
optimize&compile&execute the kernel, but
> first makes an estimation if it is worth to, or better to fall back to host
> code equivalent.
>
> Proposal made by Tobias is very elegant, but it seems to be addressing the
> case when host and sub-architectures' code exist in the same time. May
I
> kindly point out that to our experience the really efficient deeply
> specialized sub-architectures code may simply not exist at compile time,
> while the generic baseline host code always can.
>
> Best,
> - Dima.
>
>
> 2012/7/26 Duncan Sands <baldrick at free.fr>
>
>> Hi Tobias, I didn't really get it.  Is the idea that the same
bitcode is
>> going to be codegen'd for different architectures, or is each
sub-module
>> going to contain different bitcode?  In the later case you may as well
>> just use multiple modules, perhaps in conjunction with a scheme to
store
>> more than one module in the same file on disk as a convenience.
>>
>> Ciao, Duncan.
>>
>> > a couple of weeks ago I discussed with Peter how to improve
LLVM's
>> > support for heterogeneous computing. One weakness we (and others)
have
>> > seen is the absence of multi-module support in LLVM. Peter came up
with
>> > a nice idea how to improve here. I would like to put this idea up
for
>> > discussion.
>> >
>> > ## The problem ##
>> >
>> > LLVM-IR modules can currently only contain code for a single
target
>> > architecture. However, there are multiple use cases where one
>> > translation unit could contain code for several architectures.
>> >
>> > 1) CUDA
>> >
>> > cuda source files can contain both host and device code. The
absence of
>> > multi-module support complicates adding CUDA support to clang, as
clang
>> > would need to perform multi-module compilation on top of a
single-module
>> > based compiler framework.
>> >
>> > 2) C++ AMP
>> >
>> > C++ AMP [1] contains - similarly to CUDA - both host code and
device
>> > code in the same source file. Even if C++ AMP is a Microsoft
extension
>> > the use case itself is relevant to clang. It would be great if
LLVM
>> > would provide infrastructure, such that front-ends could easily
target
>> > accelerators. This would probably yield a lot of interesting
>> experiments.
>> >
>> > 3) Optimizers
>> >
>> > To fully automatically offload computations to an accelerator an
>> > optimization pass needs to extract the computation kernels and
schedule
>> > them as separate kernels on the device. Such kernels are normally
>> > LLVM-IR modules for different architectures. At the moment, passes
have
>> > no way to create and store new LLVM-IR modules. There is also no
way
>> > to reference kernel LLVM-IR modules from a host module (which is
>> > necessary to pass them to the accelerator run-time).
>> >
>> > ## Goals ##
>> >
>> > a) No major changes to existing tools and LLVM based applications
>> >
>> > b) Human readable and writable LLVM-IR
>> >
>> > c) FileCheck testability
>> >
>> > d) Do not force a specific execution model
>> >
>> > e) Unlimited number of embedded modules
>> >
>> > ## Detailed Goals
>> >
>> > a)
>> >    o No changes should be required, if a tool does not use
multi-module
>> >      support. Each LLVM-IR file valid today, should remain valid.
>> >
>> >    o Major tools should support basic heterogeneous modules
without
>> large
>> >      changes. Some of the commands that should work after smaller
>> >      adaptions:
>> >
>> >      clang -S -emit-llvm -o out.ll
>> >      opt -O3 out.ll -o out.opt.ll
>> >      llc out.opt.ll
>> >      lli out.opt.ll
>> >      bugpoint -O3 out.opt.ll
>> >
>> > b) All (sub)modules should be directly human readable/writable.
>> >      There should be no need to extract single modules before
modifying
>> >      them.
>> >
>> > c) The LLVM-IR generated from a heterogeneous multi-module should
>> >      easily be 'FileCheck'able. The same is true, if a
multi-module is
>> >      the result of an optimization.
>> >
>> > d) In CUDA/OpenCL/C++ AMP kernels are scheduled from within the
host
>> >      code. This means arbitrary host code can decide under which
>> >      conditions kernels are scheduled for execution. It is
therefore
>> >      necessary to reference individual sub-modules from within the
host
>> >      module.
>> >
>> > e) CUDA/OpenCL allows to compile and schedule an arbitrary number
of
>> >      kernels. We do not want to put an artificial limit on the
number of
>> >      modules they are represented in. This means a single embedded
>> >      submodule is not enough.
>> >
>> > ## Non Goals ##
>> >
>> > o Modeling sub-architectures on a per-function basis
>> >
>> > Functions could be specialized for a certain sub-architecture.
This is
>> > helpful to have certain functions optimized e.g. with AVX2
enabled, but
>> > the general program being compiled for a more generic
architecture.
>> > We do not address per-function annotations in this proposal.
>> >
>> > ## Proposed solution ##
>> >
>> > To bring multi-module support to LLVM, we propose to add a new
type
>> > called 'llvmir' to LLVM-IR. It can be used to embed
LLVM-IR submodules
>> > as global variables.
>> >
>> >
------------------------------------------------------------------------
>> > target datalayout = ...
>> > target triple = "x86_64-unknown-linux-gnu"
>> >
>> > @llvm_kernel = private unnamed_addr constant llvm_kernel {
>> >     target triple = nvptx64-unknown-unknown
>> >     define internal ptx_kernel void @gpu_kernel(i8* %Array) {
>> >       ...
>> >     }
>> > }
>> >
------------------------------------------------------------------------
>> >
>> > By default the global will be compiled to a llvm string stored in
the
>> > object file. We could also think about translating it to PTX or
AMD's
>> > HSA-IL, such that e.g. PTX can be passed to a run-time library.
>> >
>> >   From my point of view, Peters idea allows us to add multi-module
>> > support in a way that allows us to reach the goals described
above.
>> > However, to properly design and implement it, early feedback would
be
>> > valuable.
>> >
>> > Cheers
>> > Tobi
>> >
>> > [1] http://msdn.microsoft.com/en-us/library/hh265137%28v=vs.110%29
>> > [2]
>> >
>>
http://www.amd.com/us/press-releases/Pages/amd-arm-computing-innovation-2012june12.aspx
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120726/6530ed47/attachment.html>

Duncan Sands

2012-Jul-26 11:23 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

Hi Dmitry,
> In our project we combine regular binary code and LLVM IR code for kernels,
> embedded as a special data symbol of ELF object. The LLVM IR for kernel
existing
> at compile-time is preliminary, and may be optimized further during runtime
> (pointers analysis, polly, etc.). During application startup, runtime
system
> builds an index of all kernels sources embedded into the executable. Host
and
> kernel code interact by means of special "launch" call, which
does not only
> optimize&compile&execute the kernel, but first makes an estimation
if it is
> worth to, or better to fall back to host code equivalent.
in your case it doesn't sound like any modifications to what a module can
hold
are needed, it's more a question of building stuff on top of the existing
infrastructure.
> Proposal made by Tobias is very elegant, but it seems to be addressing the
case
> when host and sub-architectures' code exist in the same time. May I
kindly point
> out that to our experience the really efficient deeply specialized
> sub-architectures code may simply not exist at compile time, while the
generic
> baseline host code always can.
I can't help feeling that Tobias is reinventing "tar", only upside
down, and
rather than stuffing an archive inside modules he should be stuffing modules
inside an archive.  But most likely I just completely failed to understand
where he's going.

Ciao, Duncan.
>
> Best,
> - Dima.
>
> 2012/7/26 Duncan Sands <baldrick at free.fr <mailto:baldrick at
free.fr>>
>
>     Hi Tobias, I didn't really get it.  Is the idea that the same
bitcode is
>     going to be codegen'd for different architectures, or is each
sub-module
>     going to contain different bitcode?  In the later case you may as well
>     just use multiple modules, perhaps in conjunction with a scheme to
store
>     more than one module in the same file on disk as a convenience.
>
>     Ciao, Duncan.
>
>      > a couple of weeks ago I discussed with Peter how to improve
LLVM's
>      > support for heterogeneous computing. One weakness we (and others)
have
>      > seen is the absence of multi-module support in LLVM. Peter came
up with
>      > a nice idea how to improve here. I would like to put this idea up
for
>      > discussion.
>      >
>      > ## The problem ##
>      >
>      > LLVM-IR modules can currently only contain code for a single
target
>      > architecture. However, there are multiple use cases where one
>      > translation unit could contain code for several architectures.
>      >
>      > 1) CUDA
>      >
>      > cuda source files can contain both host and device code. The
absence of
>      > multi-module support complicates adding CUDA support to clang, as
clang
>      > would need to perform multi-module compilation on top of a
single-module
>      > based compiler framework.
>      >
>      > 2) C++ AMP
>      >
>      > C++ AMP [1] contains - similarly to CUDA - both host code and
device
>      > code in the same source file. Even if C++ AMP is a Microsoft
extension
>      > the use case itself is relevant to clang. It would be great if
LLVM
>      > would provide infrastructure, such that front-ends could easily
target
>      > accelerators. This would probably yield a lot of interesting
experiments.
>      >
>      > 3) Optimizers
>      >
>      > To fully automatically offload computations to an accelerator an
>      > optimization pass needs to extract the computation kernels and
schedule
>      > them as separate kernels on the device. Such kernels are normally
>      > LLVM-IR modules for different architectures. At the moment,
passes have
>      > no way to create and store new LLVM-IR modules. There is also no
way
>      > to reference kernel LLVM-IR modules from a host module (which is
>      > necessary to pass them to the accelerator run-time).
>      >
>      > ## Goals ##
>      >
>      > a) No major changes to existing tools and LLVM based applications
>      >
>      > b) Human readable and writable LLVM-IR
>      >
>      > c) FileCheck testability
>      >
>      > d) Do not force a specific execution model
>      >
>      > e) Unlimited number of embedded modules
>      >
>      > ## Detailed Goals
>      >
>      > a)
>      >    o No changes should be required, if a tool does not use
multi-module
>      >      support. Each LLVM-IR file valid today, should remain valid.
>      >
>      >    o Major tools should support basic heterogeneous modules
without large
>      >      changes. Some of the commands that should work after smaller
>      >      adaptions:
>      >
>      >      clang -S -emit-llvm -o out.ll
>      >      opt -O3 out.ll -o out.opt.ll
>      >      llc out.opt.ll
>      >      lli out.opt.ll
>      >      bugpoint -O3 out.opt.ll
>      >
>      > b) All (sub)modules should be directly human readable/writable.
>      >      There should be no need to extract single modules before
modifying
>      >      them.
>      >
>      > c) The LLVM-IR generated from a heterogeneous multi-module should
>      >      easily be 'FileCheck'able. The same is true, if a
multi-module is
>      >      the result of an optimization.
>      >
>      > d) In CUDA/OpenCL/C++ AMP kernels are scheduled from within the
host
>      >      code. This means arbitrary host code can decide under which
>      >      conditions kernels are scheduled for execution. It is
therefore
>      >      necessary to reference individual sub-modules from within
the host
>      >      module.
>      >
>      > e) CUDA/OpenCL allows to compile and schedule an arbitrary number
of
>      >      kernels. We do not want to put an artificial limit on the
number of
>      >      modules they are represented in. This means a single
embedded
>      >      submodule is not enough.
>      >
>      > ## Non Goals ##
>      >
>      > o Modeling sub-architectures on a per-function basis
>      >
>      > Functions could be specialized for a certain sub-architecture.
This is
>      > helpful to have certain functions optimized e.g. with AVX2
enabled, but
>      > the general program being compiled for a more generic
architecture.
>      > We do not address per-function annotations in this proposal.
>      >
>      > ## Proposed solution ##
>      >
>      > To bring multi-module support to LLVM, we propose to add a new
type
>      > called 'llvmir' to LLVM-IR. It can be used to embed
LLVM-IR submodules
>      > as global variables.
>      >
>      >
------------------------------------------------------------------------
>      > target datalayout = ...
>      > target triple = "x86_64-unknown-linux-gnu"
>      >
>      > @llvm_kernel = private unnamed_addr constant llvm_kernel {
>      >     target triple = nvptx64-unknown-unknown
>      >     define internal ptx_kernel void @gpu_kernel(i8* %Array) {
>      >       ...
>      >     }
>      > }
>      >
------------------------------------------------------------------------
>      >
>      > By default the global will be compiled to a llvm string stored in
the
>      > object file. We could also think about translating it to PTX or
AMD's
>      > HSA-IL, such that e.g. PTX can be passed to a run-time library.
>      >
>      >   From my point of view, Peters idea allows us to add
multi-module
>      > support in a way that allows us to reach the goals described
above.
>      > However, to properly design and implement it, early feedback
would be
>      > valuable.
>      >
>      > Cheers
>      > Tobi
>      >
>      > [1]
http://msdn.microsoft.com/en-us/library/hh265137%28v=vs.110%29
>      > [2]
>      >
>    
http://www.amd.com/us/press-releases/Pages/amd-arm-computing-innovation-2012june12.aspx
>      > _______________________________________________
>      > LLVM Developers mailing list
>      > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>      > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>      >
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

dag at cray.com

2012-Jul-26 15:16 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

"Dmitry N. Mikushin" <maemarcus at gmail.com> writes:
> Proposal made by Tobias is very elegant, but it seems to be addressing
> the case when host and sub-architectures' code exist in the same time.
I don't know why that would have to be the case.  Couldn't your
accelerator backend simply read in the proposed IR string and
optimize/codegen it?
> May I kindly point out that to our experience the really efficient
> deeply specialized sub-architectures code may simply not exist at
> compile time, while the generic baseline host code always can.
As I mentioned earlier, I am more concerned about the case where there
is no accelerator compiler executed at runtime.  All the code for the
host and accelerate needs to be available in native format at run time.
A string representation in the object file doesn't allow that.

                                   -Dave

Dmitry N. Mikushin

2012-Jul-26 15:47 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

> Couldn't your accelerator backend simply read in the proposed IR stringand optimize/codegen it?

Sure, it does, but that IR is long way from the final target-specific IR to
be specialized in runtime. And in the proposed design both host and
accelerator code seem to be intended for codegen before application
execution. This is not always the case, moreover it implicitly reduces the
visible use-scope of Polly, which is much more powerful and can also work
together with JIT.

- D.

2012/7/26 <dag at cray.com>
> "Dmitry N. Mikushin" <maemarcus at gmail.com> writes:
>
> > Proposal made by Tobias is very elegant, but it seems to be addressing
> > the case when host and sub-architectures' code exist in the same
time.
>
> I don't know why that would have to be the case.  Couldn't your
> accelerator backend simply read in the proposed IR string and
> optimize/codegen it?
>
> > May I kindly point out that to our experience the really efficient
> > deeply specialized sub-architectures code may simply not exist at
> > compile time, while the generic baseline host code always can.
>
> As I mentioned earlier, I am more concerned about the case where there
> is no accelerator compiler executed at runtime.  All the code for the
> host and accelerate needs to be available in native format at run time.
> A string representation in the object file doesn't allow that.
>
>                                    -Dave
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120726/c731ac7f/attachment.html>

Tobias Grosser

2012-Jul-29 19:23 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

On 07/26/2012 04:12 PM, Dmitry N. Mikushin wrote:> In our project we combine regular binary code and LLVM IR code for
> kernels, embedded as a special data symbol of ELF object. The LLVM IR
> for kernel existing at compile-time is preliminary, and may be optimized
> further during runtime (pointers analysis, polly, etc.). During
> application startup, runtime system builds an index of all kernels
> sources embedded into the executable. Host and kernel code interact by
> means of special "launch" call, which does not only
> optimize&compile&execute the kernel, but first makes an estimation
if it
> is worth to, or better to fall back to host code equivalent.
>
> Proposal made by Tobias is very elegant, but it seems to be addressing
> the case when host and sub-architectures' code exist in the same time.
> May I kindly point out that to our experience the really efficient
> deeply specialized sub-architectures code may simply not exist at
> compile time, while the generic baseline host code always can.
Hi Dimitry,

the proposal did not mean to say that all code needs to be optimized and 
target code generate at compile time. You may very well retain some 
kernels as LLVM-IR code and pass this code to your runtime system 
(similar how CUDA or OpenCL currently accept kernel code).

Btw, one question I always wanted to ask: What is the benefit of having 
the kernel embedded as data symbol in the ELF object, in contrast to 
having it as a global variable (which is then passed to the run-time). I 
know cell used mainly elf symbols, but e.g. OpenCL reads kernels by 
passing a pointer to the kernel string to the run-time library. Can you 
point out the difference to me?

Cheers and thanks
Tobi

Tobias Grosser

2012-Jul-29 19:30 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

On 07/26/2012 04:49 PM, Justin Holewinski wrote:> I'm not convinced that having multi-module IR files is the way to go.
>   It just seems like a lot of infrastructure/design work for little
> gain.  Can the embedded modules have embedded modules themselves?  How
> deep can this go?  If not, then the embedded LLVM IR language is really
> a subset of the full LLVM IR language.  How do you share variables
> between parent and embedded modules?
I don't have final answers to these questions, but here my current 
thoughts: I do not see a need for deeply nested modules, but I also 
don't see a big problem. Variables between parent and embedded modules 
are not shared. They are within separate address spaces.
> I feel that this can be better solved by just using separate IR modules.
>   For your purposes, the pass that generates the device code can simply
> create a new module and the host code can refer to the generated code by
> name.  Then, you can run each module through opt and llc individually,
> and then link them together somehow, like Dmitry's use of ELF
> symbols/sections.  This is exactly how CUDA binaries work; device code
> is embedded into the host binary as special ELF sections.  This would be
> a bit more work on the part of your toolchain to make sure opt and llc
> and executed for each produced module, but the changes are far fewer
> than supporting sub-modules in a single IR file.  This also has the
> benefit that you do not need to change LLVM at all for this to work.
>
> Is there some particular use-case that just won't work without
> sub-module support?  I know you like using the example of "clang -o -
|
> opt -o - | llc" but I'm just not convinced that retaining the
ability to
> pipe tools like that is justification enough to change such a
> fundamental part of the LLVM system.
As I mentioned to Duncan, I agree with you that for a specific tool 
chain, the approach you mentioned is probably best. However, I am aiming 
for a more generic approach, which aims for optimizer plugins that can 
be used in various LLVM-based compilers, without the need for larger 
changes to each these compilers. Do you think that is a useful goal?

Thanks for your feedback
Tobi

Dmitry N. Mikushin

2012-Jul-30 02:10 UTC

head link

[LLVMdev] [PROPOSAL] LLVM multi-module support

Hi Tobias,
> What is the benefit of having the kernel embedded as data symbol in theELF object, in contrast to having it as a global variable

This is for conventional link step & LTO. During compilation we allow
kernels to depend on each other, which is resolved during linking. The
whole process is built on top of gcc and its existing collect2/lto1
mechanisms. As result, we have hybrid objects/libraries/binaries containing
two independent representations: regular binary output from gcc and a set
of LLVM IR of kernels operated by their own entry point. And now the code
is not in the data section, but in special one, similar to __gnu_lto_v1 for
gcc's LTO.

One question I realized while replying to yours: do you see your team more
focusing on infrastructure things or on polyhedral analysis development?

The quality of CLooG/Polly is what we ultimately rely on in the _first_
place. All other things are _a_lot_ simpler. You will see: ecosystems,
applications and testbeds will grow around themselves, once the core
concepts is strong. There is probably no need to spend resources on leading
the way for them in engineering topics. But they may wither soon, if math
is not growing with the same speed. Just an opinion.

Best,
- D.

2012/7/29 Tobias Grosser <tobias at grosser.es>
> On 07/26/2012 04:12 PM, Dmitry N. Mikushin wrote:
>
>> In our project we combine regular binary code and LLVM IR code for
>> kernels, embedded as a special data symbol of ELF object. The LLVM IR
>> for kernel existing at compile-time is preliminary, and may be
optimized
>> further during runtime (pointers analysis, polly, etc.). During
>> application startup, runtime system builds an index of all kernels
>> sources embedded into the executable. Host and kernel code interact by
>> means of special "launch" call, which does not only
>> optimize&compile&execute the kernel, but first makes an
estimation if it
>> is worth to, or better to fall back to host code equivalent.
>>
>> Proposal made by Tobias is very elegant, but it seems to be addressing
>> the case when host and sub-architectures' code exist in the same
time.
>> May I kindly point out that to our experience the really efficient
>> deeply specialized sub-architectures code may simply not exist at
>> compile time, while the generic baseline host code always can.
>>
>
> Hi Dimitry,
>
> the proposal did not mean to say that all code needs to be optimized and
> target code generate at compile time. You may very well retain some kernels
> as LLVM-IR code and pass this code to your runtime system (similar how CUDA
> or OpenCL currently accept kernel code).
>
> Btw, one question I always wanted to ask: What is the benefit of having
> the kernel embedded as data symbol in the ELF object, in contrast to having
> it as a global variable (which is then passed to the run-time). I know cell
> used mainly elf symbols, but e.g. OpenCL reads kernels by passing a pointer
> to the kernel string to the run-time library. Can you point out the
> difference to me?
>
> Cheers and thanks
> Tobi
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120730/ab8298d7/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jul 2012 - [LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

[LLVMdev] [PROPOSAL] LLVM multi-module support

Possibly Parallel Threads