Johannes Doerfert via llvm-dev
2020-Jul-30 16:44 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On 7/30/20 11:11 AM, Renato Golin wrote: > On Thu, 30 Jul 2020 at 16:58, Johannes Doerfert > <johannesdoerfert at gmail.com> wrote: >> I mean, you can put the command line string that set the options into >> the first place, right? That is as long as it initially was, or maybe I >> am missing something. > > Options change with time, and this would make the IR incompatible > across releases without intentionally doing so. You could arguably be forgiving when it comes to the parsing of these so you might loose some if you mix IR across releases but right now you cannot express this at all. I mean, IR looks as if it captures the entire state but not quite. As a use case, the question how to reproduce `clang -O3` with opt comes up every month or so on the list. Let's table this for now as it seems unrelated to this proposal. >> To recap things that might "differ" from the original proposal: >> - We want multiple target triples. >> - We probably want multiple data layouts. >> - We probably want multiple pass pipelines, with different (cmd >> line) options and such. >> - We might want to make modules self contained wrt. target options >> such that you can create TTI and friends w/o repeating driver >> options. > > The extent of the separation is what made me suggest that it might be > easier, in the end, to carry multiple modules, from different > front-ends, through multiple pipelines but interacting with each > other. > > I guess this is why David made a parallel with LTO, as this ends up as > being a multi-device LTO in a sense. I think that will be easier and > much less intrusive than rewriting the global context, target flags, > IR annotation, data layout assumptions, target triple parsing, target > options bundling, etc. It is definitively multi-device (link time) optimization. The link time part is somewhat optional and might be misleading given the popularity of single source programming models for accelerators. The "thinLTO" idea would also not be sufficient for everything we hope to do, the two module approach would be though. What if we don't rewrite these things but still merge the modules? Let me explain ;) (I use `opt` invocations below as a placeholder for the lack of a better term but knowing it is not (only) the `opt` tool we talk about.) The problem is that the `opt` invocation is primed for a single target, everything (=pipeline, TTI, flags, ...) exists only once, right? I imagine the two module approach to run two `opt` invocations, one for each module, which we would synchronize at some point to do cross-module optimizations. Given that we can run two `opt` invocations and we assume a pass can work with two modules, that is two sets of everything, why do we need the separation? From a tooling perspective I think it makes things easier to have a single module. That said, it should not preclude us to run two separate `opt` invocations on it. So we don't rewrite everything but instead "just" need to duplicate all the information in the IR such that each `opt` invocation can extract it's respective set of values and run on the respective set of global symbols. This would reduce the new stuff to more or less what we started with: device triple & DL, and a way to link global symbol to a device triple & DL. It is the two module approach but with "co-located" modules ;) WDYT? ~ Johannes P.S. This is really helpful but I won't give up so easily on the idea. If I do, I have to implement cross module optimizations and I would rather not ;)
Renato Golin via llvm-dev
2020-Jul-30 17:41 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On Thu, 30 Jul 2020 at 17:46, Johannes Doerfert <johannesdoerfert at gmail.com> wrote:> So we don't rewrite > everything but instead "just" need to duplicate all the information in > the IR such that each `opt` invocation can extract it's respective set > of values and run on the respective set of global symbols. This would > reduce the new stuff to more or less what we started with: device triple > & DL, and a way to link global symbol to a device triple & DL. It is the > two module approach but with "co-located" modules ;)I think you're being overly optimistic in hoping the "triple+DL" representation will be enough to emulate multi-target. It may work for the cases you care about but it will create a host of corner cases that the community will have to maintain with no tangible additional benefit to a large portion of it. But I'd like to heat the opinion of others on the subject before making up my own mind... --renato
Johannes Doerfert via llvm-dev
2020-Jul-30 18:02 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On 7/30/20 12:41 PM, Renato Golin wrote: > On Thu, 30 Jul 2020 at 17:46, Johannes Doerfert > <johannesdoerfert at gmail.com> wrote: >> So we don't rewrite >> everything but instead "just" need to duplicate all the information in >> the IR such that each `opt` invocation can extract it's respective set >> of values and run on the respective set of global symbols. This would >> reduce the new stuff to more or less what we started with: device triple >> & DL, and a way to link global symbol to a device triple & DL. It is the >> two module approach but with "co-located" modules ;) > > I think you're being overly optimistic in hoping the "triple+DL" > representation will be enough to emulate multi-target. I might be optimistic about the impact such a representation has but I don't see how it should not be enough. I argue we keep *all* the information that we currently have in two modules but put it in a single one. What else is there (in the IR)? With two `opt` invocations you don't miss out on flags and target information either. At the end of the day I am suggesting to have a single `llvm::Module` with the same information that was in multiple ones before. That will require us to allow duplicates for all "global" entities (triple, DL, module metadata, ...) and to tie global symbols to such entities. Summarized, this approach would keep the modules logically separate, e.g., we would have different namespaces for globals, the pass pipelines actually separate, and only co-locate the representation to simplify tooling in various places. For a single-device invocation of `opt`, the module should not "behave" any different than the one that you get if you extract the code for that device first (or not merge it in the first place). > It may work for the cases you care about but it will create a host of > corner cases that the community will have to maintain with no tangible > additional benefit to a large portion of it. I would like to think that a large portion of the community actually benefits; at least everyone that cares about accelerators, which is a growing fraction I would assume. The reason I started this thread is to discuss use and corner cases, I think it is working so far. If you feel there are (known) corner cases that I somehow ignore, please let me know. Similarly, I don't want to ignore any use case that is brought forward. However, I am aware that the interesting corner cases might yet be unknown and will only reveal themselves as we go. > But I'd like to heat the opinion of others on the subject before > making up my own mind... I think more input would be great :) ~ Johannes