thr3ads.net - llvm dev - [llvm-dev] [RFC] Heterogeneous LLVM-IR Modules [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Johannes Doerfert via llvm-dev

2020-Jul-30 12:57 UTC

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

[off topic] I'm not a fan of the "reply-to-list" default.


Thanks for the feedback! More below.


On 7/30/20 6:01 AM, David Chisnall via llvm-dev wrote:> On 28/07/2020 07:00, Johannes Doerfert via llvm-dev wrote:
>> TL;DR
>> -----
>>
>> Let's allow to merge to LLVM-IR modules for different targets (with
>> compatible data layouts) into a single LLVM-IR module to facilitate
>> host-device code optimizations.
>
> I think it's worth taking a step back here and thinking through the 
> problem.  The proposed solution makes me nervous because it is quite a 
> significant change to the compiler flow that comes from thinking of 
> heterogeneous optimisation as an fat LTO problem, when to me it feels 
> more like a thin LTO problem.
>
> At the moment, there's an implicit assumption that everything in a 
> Module will flow to the same CodeGen back end.  It can make global 
> assumptions about cost models, can inline everything, and so on.
>FWIW, I would expect that we split the module *before* the codegen stage 
such that the back end doesn't have to deal with heterogeneous models 
(right now).

I'm not sure about cost models and such though. As far as I know, we 
don't do global decisions anywhere but I might be wrong. Put 
differently, I hope we don't do global decisions as it seems quite easy 
to disturb the result with unrelated code changes.

> It sounds as if we have a couple of use cases:
>
>  - Analysis flow between modules
>  - Transforms that modify two modules
>Yes! Notably the first bullet is bi-directional and cyclic ;)

> The first case is where the motivating example of constant 
> propagation. This feels like the right approach is something like 
> ThinLTO, where you can collect in one module the fact that a kernel is 
> invoked only with specific constant arguments in the host module and 
> consume that result in the target module.
>Except that you can have cyclic dependencies which makes this 
problematic again. You might not propagate constants from the device 
module to the host one, but if memory is only read/written on the device 
is very interesting on the host side. You can avoid memory copies, 
remove globals, etc. That is just what comes to mind right away. The 
proposed heterogeneous modules should not limit you to "monolithic
LTO",
or "thin LTO" for that matter.

> The second example is what you'd need for things like kernel fusion, 
> where you need to both combine two kernels in the target module and 
> also modify the callers to invoke the single kernel and skip some data 
> flow. For this, you need a kind of pass that can work over things that 
> begin in two modules.
>Right. Splitting, fusing, moving code, etc. all require you to modify 
both modules at the same time. Even if you only modify one module, you 
want information from both, either direction.

> It seems that a less invasive change would be:
>
>  - Use ThinLTO metadata for the first case, extend it as required.
>  - Add a new kind of ModuleSetPass that takes a set of Modules and is 
> allowed to modify both.
>
> This avoids any modifications for the common (single-target) case, but 
> should give you the required functionality.  Am I missing something?
>This is similar to what Renato suggested early on. In addition to the 
"ThinLTO metadata" inefficiencies outlined above, the problem I have 
with the second part is that it requires to write completely new passes 
in a different style than anything we have. It is certainly a 
possibility but we can probably do it without any changes to the 
infrastructure.

In addition to the analysis/optimization infrastructure reasons I would 
like to point out that this would make our toolchains a lot easier. We 
have some embedding of device code in host code right now (on every 
level) and things like LTO for all offloading models would become much 
easier if we distribute the heterogeneous modules instead of yet another 
embedding. I might be biased by the way "clang offload bundler" is
used
right now for OpenMP, HIP, etc. but I would very much like to replace 
that with a "clean" toolchain that performs as much LTO as possible,
at
least for the accelerator code.

I hope this makes some sense, feel free to ask questions :)


~ Johannes


> David
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2020-Jul-30 13:48 UTC

head link

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

On Thu, 30 Jul 2020 at 13:59, Johannes Doerfert via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> FWIW, I would expect that we split the module *before* the codegen stage
> such that the back end doesn't have to deal with heterogeneous models
> (right now).
Indeed. Even if the multiple targets are all supported by the same
back-end (ex. different Arm families), the target info decisions are
too ingrained in how we created the back-ends to be easy (or even
possible) to split.
> I'm not sure about cost models and such though. As far as I know, we
> don't do global decisions anywhere but I might be wrong. Put
> differently, I hope we don't do global decisions as it seems quite easy
> to disturb the result with unrelated code changes.
Target info (ex. TTI) are dependent on triple + hidden parameters
(passed down from the driver as target options), which are global.

As I said before, having multiple target triples in the source will
not change that, and we'll have to create multiple groups of driver
flags, applicable to different triples. Or we'll need to merge modules
from different front-ends, in which case this looks more and more like
LTO.

This will not be trivial to map and the data layout does not reflect
any of that.

cheers,
--renato

Johannes Doerfert via llvm-dev

2020-Jul-30 14:08 UTC

head link

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

On 7/30/20 8:48 AM, Renato Golin wrote:> On Thu, 30 Jul 2020 at 13:59, Johannes Doerfert via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> FWIW, I would expect that we split the module *before* the codegen
stage
>> such that the back end doesn't have to deal with heterogeneous
models
>> (right now).
> Indeed. Even if the multiple targets are all supported by the same
> back-end (ex. different Arm families), the target info decisions are
> too ingrained in how we created the back-ends to be easy (or even
> possible) to split.
Right, and I don't see the need to generate code "together" ;)

>> I'm not sure about cost models and such though. As far as I know,
we
>> don't do global decisions anywhere but I might be wrong. Put
>> differently, I hope we don't do global decisions as it seems quite
easy
>> to disturb the result with unrelated code changes.
> Target info (ex. TTI) are dependent on triple + hidden parameters
> (passed down from the driver as target options), which are global.
>
> As I said before, having multiple target triples in the source will
> not change that, and we'll have to create multiple groups of driver
> flags, applicable to different triples. Or we'll need to merge modules
> from different front-ends, in which case this looks more and more like
> LTO.
>
> This will not be trivial to map and the data layout does not reflect
> any of that.
So in addition to multiple target triples and DLs we would probably want 
multiple target info objects, correct?

At this point I ask myself if it wouldn't be better to make the target 
cpu, features, and other "hidden parameters"

explicit in the module itself. (I suggested part of that recently anyway 
[0].) That way we could create the

proper target info from the IR, which seems to me like something 
valuable even in the current single-target setting.

I mean, wouldn't that allow us to make `clang -emit-llvm` followed by 
`opt` behave more like a single `clang` invocation?

If so, that seems desirable ;)


~ Johannes



[0] https://reviews.llvm.org/D80750#2180284

>
> cheers,
> --renato

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jul 2020 - [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Seemingly Similar Threads