Johannes Doerfert via llvm-dev
2020-Jul-30 15:56 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On 7/30/20 10:01 AM, Renato Golin wrote: > On Thu, 30 Jul 2020 at 15:09, Johannes Doerfert > <johannesdoerfert at gmail.com> wrote: >> At this point I ask myself if it wouldn't be better to make the target >> cpu, features, and other "hidden parameters" explicit in the module itself. >> (I suggested part of that recently anyway[0].) That way we could create the >> proper target info from the IR, which seems to me like something >> valuable even in the current single-target setting. > > This is still not enough. Other driver flags exist, which have to do > with OS and environment issues (incl. user flags) that are not part of > the target description and can affect optimisation, codegen and even > ABI. > > Some of those options apply to some targets and not others. If they > apply to all targets you have, the user might want to apply to some > but not all, and then how will this work at cmdline side? I can see that we want different command line options per target in the module. Given that we probably want to allow one pass pipeline per target, maybe we keep the options but introduce something like a `--device=N` flag which will apply all following options to the "N'th" pipeline. That way you could specify things like: ` ... --inline-threshold=1234 --device=2 --inline-threshold=5678` For TTI and such, the driver would create the appropriate version for each target and put it in the respective pipeline, as it does now, just that there are multiple pipelines. My idea in the last email was to put the relevant driver options (optionally) into the IR such that you can generate TTI and friends from the IR alone. As far as I know, this is not possible right now. Note that this is somewhat unrelated to heterogeneous modules but would potentially be helpful there. If we would manifest the options though, you could ask the driver to emit IR with target options embedded, then use `opt` and friends to work on the result (w/o repeating the flags) while still being able to create the same TTI the driver would have created for you in an "end-to-end" run. (I might not express this idea properly.) > I don't know the extent of what you can combine from all of the > existing global options into IR annotations, but my wild guess is that > it would explode the number of attributes, which is not a good thing. I mean, you can put the command line string that set the options into the first place, right? That is as long as it initially was, or maybe I am missing something. To recap things that might "differ" from the original proposal: - We want multiple target triples. - We probably want multiple data layouts. - We probably want multiple pass pipelines, with different (cmd line) options and such. - We might want to make modules self contained wrt. target options such that you can create TTI and friends w/o repeating driver options. ~ Johannes > --renato
Renato Golin via llvm-dev
2020-Jul-30 16:11 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On Thu, 30 Jul 2020 at 16:58, Johannes Doerfert <johannesdoerfert at gmail.com> wrote:> I mean, you can put the command line string that set the options into > the first place, right? That is as long as it initially was, or maybe I > am missing something.Options change with time, and this would make the IR incompatible across releases without intentionally doing so.> To recap things that might "differ" from the original proposal: > - We want multiple target triples. > - We probably want multiple data layouts. > - We probably want multiple pass pipelines, with different (cmd > line) options and such. > - We might want to make modules self contained wrt. target options > such that you can create TTI and friends w/o repeating driver > options.The extent of the separation is what made me suggest that it might be easier, in the end, to carry multiple modules, from different front-ends, through multiple pipelines but interacting with each other. I guess this is why David made a parallel with LTO, as this ends up as being a multi-device LTO in a sense. I think that will be easier and much less intrusive than rewriting the global context, target flags, IR annotation, data layout assumptions, target triple parsing, target options bundling, etc. --renato
Johannes Doerfert via llvm-dev
2020-Jul-30 16:44 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On 7/30/20 11:11 AM, Renato Golin wrote: > On Thu, 30 Jul 2020 at 16:58, Johannes Doerfert > <johannesdoerfert at gmail.com> wrote: >> I mean, you can put the command line string that set the options into >> the first place, right? That is as long as it initially was, or maybe I >> am missing something. > > Options change with time, and this would make the IR incompatible > across releases without intentionally doing so. You could arguably be forgiving when it comes to the parsing of these so you might loose some if you mix IR across releases but right now you cannot express this at all. I mean, IR looks as if it captures the entire state but not quite. As a use case, the question how to reproduce `clang -O3` with opt comes up every month or so on the list. Let's table this for now as it seems unrelated to this proposal. >> To recap things that might "differ" from the original proposal: >> - We want multiple target triples. >> - We probably want multiple data layouts. >> - We probably want multiple pass pipelines, with different (cmd >> line) options and such. >> - We might want to make modules self contained wrt. target options >> such that you can create TTI and friends w/o repeating driver >> options. > > The extent of the separation is what made me suggest that it might be > easier, in the end, to carry multiple modules, from different > front-ends, through multiple pipelines but interacting with each > other. > > I guess this is why David made a parallel with LTO, as this ends up as > being a multi-device LTO in a sense. I think that will be easier and > much less intrusive than rewriting the global context, target flags, > IR annotation, data layout assumptions, target triple parsing, target > options bundling, etc. It is definitively multi-device (link time) optimization. The link time part is somewhat optional and might be misleading given the popularity of single source programming models for accelerators. The "thinLTO" idea would also not be sufficient for everything we hope to do, the two module approach would be though. What if we don't rewrite these things but still merge the modules? Let me explain ;) (I use `opt` invocations below as a placeholder for the lack of a better term but knowing it is not (only) the `opt` tool we talk about.) The problem is that the `opt` invocation is primed for a single target, everything (=pipeline, TTI, flags, ...) exists only once, right? I imagine the two module approach to run two `opt` invocations, one for each module, which we would synchronize at some point to do cross-module optimizations. Given that we can run two `opt` invocations and we assume a pass can work with two modules, that is two sets of everything, why do we need the separation? From a tooling perspective I think it makes things easier to have a single module. That said, it should not preclude us to run two separate `opt` invocations on it. So we don't rewrite everything but instead "just" need to duplicate all the information in the IR such that each `opt` invocation can extract it's respective set of values and run on the respective set of global symbols. This would reduce the new stuff to more or less what we started with: device triple & DL, and a way to link global symbol to a device triple & DL. It is the two module approach but with "co-located" modules ;) WDYT? ~ Johannes P.S. This is really helpful but I won't give up so easily on the idea. If I do, I have to implement cross module optimizations and I would rather not ;)