Johannes Doerfert via llvm-dev
2020-Jul-30 12:57 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
[off topic] I'm not a fan of the "reply-to-list" default. Thanks for the feedback! More below. On 7/30/20 6:01 AM, David Chisnall via llvm-dev wrote:> On 28/07/2020 07:00, Johannes Doerfert via llvm-dev wrote: >> TL;DR >> ----- >> >> Let's allow to merge to LLVM-IR modules for different targets (with >> compatible data layouts) into a single LLVM-IR module to facilitate >> host-device code optimizations. > > I think it's worth taking a step back here and thinking through the > problem. The proposed solution makes me nervous because it is quite a > significant change to the compiler flow that comes from thinking of > heterogeneous optimisation as an fat LTO problem, when to me it feels > more like a thin LTO problem. > > At the moment, there's an implicit assumption that everything in a > Module will flow to the same CodeGen back end. It can make global > assumptions about cost models, can inline everything, and so on. >FWIW, I would expect that we split the module *before* the codegen stage such that the back end doesn't have to deal with heterogeneous models (right now). I'm not sure about cost models and such though. As far as I know, we don't do global decisions anywhere but I might be wrong. Put differently, I hope we don't do global decisions as it seems quite easy to disturb the result with unrelated code changes.> It sounds as if we have a couple of use cases: > > - Analysis flow between modules > - Transforms that modify two modules >Yes! Notably the first bullet is bi-directional and cyclic ;)> The first case is where the motivating example of constant > propagation. This feels like the right approach is something like > ThinLTO, where you can collect in one module the fact that a kernel is > invoked only with specific constant arguments in the host module and > consume that result in the target module. >Except that you can have cyclic dependencies which makes this problematic again. You might not propagate constants from the device module to the host one, but if memory is only read/written on the device is very interesting on the host side. You can avoid memory copies, remove globals, etc. That is just what comes to mind right away. The proposed heterogeneous modules should not limit you to "monolithic LTO", or "thin LTO" for that matter.> The second example is what you'd need for things like kernel fusion, > where you need to both combine two kernels in the target module and > also modify the callers to invoke the single kernel and skip some data > flow. For this, you need a kind of pass that can work over things that > begin in two modules. >Right. Splitting, fusing, moving code, etc. all require you to modify both modules at the same time. Even if you only modify one module, you want information from both, either direction.> It seems that a less invasive change would be: > > - Use ThinLTO metadata for the first case, extend it as required. > - Add a new kind of ModuleSetPass that takes a set of Modules and is > allowed to modify both. > > This avoids any modifications for the common (single-target) case, but > should give you the required functionality. Am I missing something? >This is similar to what Renato suggested early on. In addition to the "ThinLTO metadata" inefficiencies outlined above, the problem I have with the second part is that it requires to write completely new passes in a different style than anything we have. It is certainly a possibility but we can probably do it without any changes to the infrastructure. In addition to the analysis/optimization infrastructure reasons I would like to point out that this would make our toolchains a lot easier. We have some embedding of device code in host code right now (on every level) and things like LTO for all offloading models would become much easier if we distribute the heterogeneous modules instead of yet another embedding. I might be biased by the way "clang offload bundler" is used right now for OpenMP, HIP, etc. but I would very much like to replace that with a "clean" toolchain that performs as much LTO as possible, at least for the accelerator code. I hope this makes some sense, feel free to ask questions :) ~ Johannes> David > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2020-Jul-30 13:48 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On Thu, 30 Jul 2020 at 13:59, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote:> FWIW, I would expect that we split the module *before* the codegen stage > such that the back end doesn't have to deal with heterogeneous models > (right now).Indeed. Even if the multiple targets are all supported by the same back-end (ex. different Arm families), the target info decisions are too ingrained in how we created the back-ends to be easy (or even possible) to split.> I'm not sure about cost models and such though. As far as I know, we > don't do global decisions anywhere but I might be wrong. Put > differently, I hope we don't do global decisions as it seems quite easy > to disturb the result with unrelated code changes.Target info (ex. TTI) are dependent on triple + hidden parameters (passed down from the driver as target options), which are global. As I said before, having multiple target triples in the source will not change that, and we'll have to create multiple groups of driver flags, applicable to different triples. Or we'll need to merge modules from different front-ends, in which case this looks more and more like LTO. This will not be trivial to map and the data layout does not reflect any of that. cheers, --renato
Johannes Doerfert via llvm-dev
2020-Jul-30 14:08 UTC
[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
On 7/30/20 8:48 AM, Renato Golin wrote:> On Thu, 30 Jul 2020 at 13:59, Johannes Doerfert via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> FWIW, I would expect that we split the module *before* the codegen stage >> such that the back end doesn't have to deal with heterogeneous models >> (right now). > Indeed. Even if the multiple targets are all supported by the same > back-end (ex. different Arm families), the target info decisions are > too ingrained in how we created the back-ends to be easy (or even > possible) to split.Right, and I don't see the need to generate code "together" ;)>> I'm not sure about cost models and such though. As far as I know, we >> don't do global decisions anywhere but I might be wrong. Put >> differently, I hope we don't do global decisions as it seems quite easy >> to disturb the result with unrelated code changes. > Target info (ex. TTI) are dependent on triple + hidden parameters > (passed down from the driver as target options), which are global. > > As I said before, having multiple target triples in the source will > not change that, and we'll have to create multiple groups of driver > flags, applicable to different triples. Or we'll need to merge modules > from different front-ends, in which case this looks more and more like > LTO. > > This will not be trivial to map and the data layout does not reflect > any of that.So in addition to multiple target triples and DLs we would probably want multiple target info objects, correct? At this point I ask myself if it wouldn't be better to make the target cpu, features, and other "hidden parameters" explicit in the module itself. (I suggested part of that recently anyway [0].) That way we could create the proper target info from the IR, which seems to me like something valuable even in the current single-target setting. I mean, wouldn't that allow us to make `clang -emit-llvm` followed by `opt` behave more like a single `clang` invocation? If so, that seems desirable ;) ~ Johannes [0] https://reviews.llvm.org/D80750#2180284> > cheers, > --renato