thr3ads.net - llvm dev - [llvm-dev] [RFC] Heterogeneous LLVM-IR Modules [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Johannes Doerfert via llvm-dev

2020-Jul-30 16:44 UTC

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

On 7/30/20 11:11 AM, Renato Golin wrote:
 > On Thu, 30 Jul 2020 at 16:58, Johannes Doerfert
 > <johannesdoerfert at gmail.com> wrote:
 >> I mean, you can put the command line string that set the options into
 >> the first place, right? That is as long as it initially was, or maybe
I
 >> am missing something.
 >
 > Options change with time, and this would make the IR incompatible
 > across releases without intentionally doing so.

You could arguably be forgiving when it comes to the parsing of these so
you might loose some if you mix IR across releases but right now you
cannot express this at all. I mean, IR looks as if it captures the
entire state but not quite. As a use case, the question how to reproduce
`clang -O3` with opt comes up every month or so on the list. Let's table
this for now as it seems unrelated to this proposal.

 >> To recap things that might "differ" from the original
proposal:
 >>    - We          want multiple target triples.
 >>    - We probably want multiple data layouts.
 >>    - We probably want multiple pass pipelines, with different (cmd
 >>      line) options and such.
 >>    - We might want to make modules self contained wrt. target options
 >>      such that you can create TTI and friends w/o repeating driver
 >>      options.
 >
 > The extent of the separation is what made me suggest that it might be
 > easier, in the end, to carry multiple modules, from different
 > front-ends, through multiple pipelines but interacting with each
 > other.
 >
 > I guess this is why David made a parallel with LTO, as this ends up as
 > being a multi-device LTO in a sense. I think that will be easier and
 > much less intrusive than rewriting the global context, target flags,
 > IR annotation, data layout assumptions, target triple parsing, target
 > options bundling, etc.

It is definitively multi-device (link time) optimization. The link
time part is somewhat optional and might be misleading given the
popularity of single source programming models for accelerators. The
"thinLTO" idea would also not be sufficient for everything we hope to
do, the two module approach would be though.

What if we don't rewrite these things but still merge the modules?
Let me explain ;)

(I use `opt` invocations below as a placeholder for the lack of a better
  term but knowing it is not (only) the `opt` tool we talk about.)

The problem is that the `opt` invocation is primed for a single target,
everything (=pipeline, TTI, flags, ...) exists only once, right?
I imagine the two module approach to run two `opt` invocations, one for
each module, which we would synchronize at some point to do cross-module
optimizations. Given that we can run two `opt` invocations and we assume
a pass can work with two modules, that is two sets of everything, why do
we need the separation? From a tooling perspective I think it makes
things easier to have a single module. That said, it should not preclude
us to run two separate `opt` invocations on it. So we don't rewrite
everything but instead "just" need to duplicate all the information in
the IR such that each `opt` invocation can extract it's respective set
of values and run on the respective set of global symbols. This would
reduce the new stuff to more or less what we started with: device triple
& DL, and a way to link global symbol to a device triple & DL. It is the
two module approach but with "co-located" modules ;)

WDYT?

~ Johannes

P.S. This is really helpful but I won't give up so easily on the idea.
      If I do, I have to implement cross module optimizations and I would
      rather not ;)

Renato Golin via llvm-dev

2020-Jul-30 17:41 UTC

head link

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

On Thu, 30 Jul 2020 at 17:46, Johannes Doerfert
<johannesdoerfert at gmail.com> wrote:> So we don't rewrite
> everything but instead "just" need to duplicate all the
information in
> the IR such that each `opt` invocation can extract it's respective set
> of values and run on the respective set of global symbols. This would
> reduce the new stuff to more or less what we started with: device triple
> & DL, and a way to link global symbol to a device triple & DL. It
is the
> two module approach but with "co-located" modules ;)
I think you're being overly optimistic in hoping the "triple+DL"
representation will be enough to emulate multi-target.

It may work for the cases you care about but it will create a host of
corner cases that the community will have to maintain with no tangible
additional benefit to a large portion of it.

But I'd like to heat the opinion of others on the subject before
making up my own mind...

--renato

Johannes Doerfert via llvm-dev

2020-Jul-30 18:02 UTC

head link

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

On 7/30/20 12:41 PM, Renato Golin wrote:
 > On Thu, 30 Jul 2020 at 17:46, Johannes Doerfert
 > <johannesdoerfert at gmail.com> wrote:
 >> So we don't rewrite
 >> everything but instead "just" need to duplicate all the
information in
 >> the IR such that each `opt` invocation can extract it's respective
set
 >> of values and run on the respective set of global symbols. This would
 >> reduce the new stuff to more or less what we started with: device
triple
 >> & DL, and a way to link global symbol to a device triple & DL.
It is the
 >> two module approach but with "co-located" modules ;)
 >
 > I think you're being overly optimistic in hoping the
"triple+DL"
 > representation will be enough to emulate multi-target.

I might be optimistic about the impact such a representation has but I
don't see how it should not be enough. I argue we keep *all* the
information that we currently have in two modules but put it in a single
one. What else is there (in the IR)? With two `opt` invocations you
don't miss out on flags and target information either. At the end of the
day I am suggesting to have a single `llvm::Module` with the same
information that was in multiple ones before. That will require us to
allow duplicates for all "global" entities (triple, DL, module
metadata,
...) and to tie global symbols to such entities.

Summarized, this approach would keep the modules logically separate,
e.g., we would have different namespaces for globals, the pass pipelines
actually separate, and only co-locate the representation to simplify
tooling in various places. For a single-device invocation of `opt`, the
module should not "behave" any different than the one that you get if
you extract the code for that device first (or not merge it in the first
place).

 > It may work for the cases you care about but it will create a host of
 > corner cases that the community will have to maintain with no tangible
 > additional benefit to a large portion of it.

I would like to think that a large portion of the community actually
benefits; at least everyone that cares about accelerators, which is a
growing fraction I would assume. The reason I started this thread is to
discuss use and corner cases, I think it is working so far. If you feel
there are (known) corner cases that I somehow ignore, please let me
know. Similarly, I don't want to ignore any use case that is brought
forward. However, I am aware that the interesting corner cases might yet
be unknown and will only reveal themselves as we go.

 > But I'd like to heat the opinion of others on the subject before
 > making up my own mind...

I think more input would be great :)

~ Johannes

llvm dev - Jul 2020 - [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules