Michael Kruse via llvm-dev
2020-Jan-21 23:53 UTC
[llvm-dev] Writing loop transformations on the right representation is more productive
Am Mi., 15. Jan. 2020 um 03:30 Uhr schrieb Renato Golin <rengolin at gmail.com>:> > I see it as an advantage in respect of adoption: It can be switched on and off without affecting other parts. > > That's not necessarily true. > > If we do like Polly, it is, but then the ability to reuse code is very > low and the time spent converting across is high. If we want to reuse, > then we'll invariably add behavioural dependencies and disabling the > pass may have side-effects.This applies literally to any pass. I think the problem of reusability is even worse for the current loop optimization passes. We have multiple, partially transformation-specific dependence analyses, such LoopAccessAnalysis, DependenceInfo, LoopInterchangeLegality, etc. Another one is currently in the works. https://xkcd.com/927/ actually does apply here, but I also think that pass-specific dependence analyses do not scale.> > I'd put loop optimizations earlier into the pipeline than vectorization. Where exactly is a phase ordering problem. I'd want to at least preserve multi-dimensional subscripts. Fortunately MemRef is a core MLIR construct and unlikely to be lowered before lowering to another representation (likely LLVM-IR). > > Many front-ends do that even before lowering to IR because of the > richer semantics of the AST, but it's also common for that to > introduce bugs down the line (don't want to name any proprietary > front-ends here).This is a problem for any intermediate representation. But isn't that also the point of MLIR? To be ably to express higher-level language concepts in the IR as dialects? This as well might introduce bugs. One example is the lowering of multi-dimensional arrays from Clang's AST to LLVM-IR. We can argue whether C/C++ spec would allow GetElementPtr to be emitted with "inrange" modifier, but for VLAs, we cannot even express them in the IR, so we had an RFC to change that. I don't find the argument "there might be bugs" very convincing.> > I don't see how this is relevant for a Clang-based pipeline. Other languages likely need a different pipeline than one intended for C/C++ code. > > Yes, but we want our passes to work for all languages and be less > dependent on how well they lower their code. > > If they do it well, awesome. If not, and if we can identify patterns > in LLVM IR then there is no reason not to.This was relevant to the discussion that /all/ front-ends would have to generate good-enough annotations for loop transformations. Only the ones that do might enable loop optimization passes. Generally, I'd try to to make it easy for other front-end to have loop optimizations. For instance, avoid isa<LoadInst> in favor of a more generic "mayReadFromMemory" in analysis/transformation phases. Michael
Renato Golin via llvm-dev
2020-Jan-22 12:14 UTC
[llvm-dev] Writing loop transformations on the right representation is more productive
On Tue, 21 Jan 2020 at 23:54, Michael Kruse <llvmdev at meinersbur.de> wrote:> Am Mi., 15. Jan. 2020 um 03:30 Uhr schrieb Renato Golin <rengolin at gmail.com>: > > > I see it as an advantage in respect of adoption: It can be switched on and off without affecting other parts. > > > > That's not necessarily true. > > This applies literally to any pass.Precisely why I don't think adding more passes is an advantage to adoption. :)> I don't find the argument "there might be bugs" very convincing.Sorry, it wasn't an argument, just a jest at the expense of some old commercial front-ends. Pass ordering is complex no matter how you slice it.> This was relevant to the discussion that /all/ front-ends would have > to generate good-enough annotations for loop transformations. Only the > ones that do might enable loop optimization passes.This doesn't scale. You end up with a few metadata from a single front-end justifying a huge new pass that does a few things in even fewer cases. VPlan is still crawling, requiring small incremental improvements, because we're trying to replace an existing pass. Your proposal is a new pass, that does some new things and therefore shouldn't need to be incremental (only when the correct info is present). But that means now we have two loop transformation infrastructures that could radically change the IR (and the loop metadata, if any). Which one passes first will have the advantage, as I think we agreed, pass ordering is not trivial. If you restrict this new pass to only doing transformations that are guaranteed to be valid and better (for some definition of both), then you'll end like Polly that does wonders to a very small selection of loops and nothing at all to most of them, just wasted time looking at all the possibilities. If you want to be more aggressive, then the IR will change more often and the pass ordering problem gets worse, requiring changes in later passes to cope with the changes. For better or worse, we already have such a pass, the VPlan infrastructure. It might not be perfect, but it's the path we took years ago and I'd strongly encourage people to make it better before throwing it away and coming up with a completely new idea. VPlan is a simple idea, to represent loops in a more meaningful way, to represent transformations in a composable way, and to find a sequence of transforms that would yield the best result. To me, your proposal is identical in its meaning, but has different implementation details: * Tree instead of VPobjects, with additional benefits for new cases. * Heuristics search of transforms wih cheap state copies. To me, it sounds like we could do both in VPlan, and even call it LTPlan (Loop Transformation plan) to make it clear it's more generic than just SIMD vectorisation. More importantly, it would be a series of incremental steps towards a better loop infrastructure, building on the work of the last many years and likely show real benefits much sooner and with a lot less conflicts than starting from scratch. --renato
Michael Kruse via llvm-dev
2020-Jan-26 02:05 UTC
[llvm-dev] Writing loop transformations on the right representation is more productive
Am Mi., 22. Jan. 2020 um 02:14 Uhr schrieb Renato Golin <rengolin at gmail.com>:> Precisely why I don't think adding more passes is an advantage to adoption. :)The alternative is to have a single pass for each kind of loop transformations, i.e. many more than a single loop transformation pass.> > I don't find the argument "there might be bugs" very convincing. > > Sorry, it wasn't an argument, just a jest at the expense of some old > commercial front-ends. > > Pass ordering is complex no matter how you slice it.Indeed. I am already concerned of additional phase ordering problems if we implement each loop transformation in its own pass, e.g. between loop fusion and loop distribution. Do we first fuse into as few loops as possible and then distribute, or the other way around?> > This was relevant to the discussion that /all/ front-ends would have > > to generate good-enough annotations for loop transformations. Only the > > ones that do might enable loop optimization passes. > > This doesn't scale. You end up with a few metadata from a single > front-end justifying a huge new pass that does a few things in even > fewer cases.I'd think that the metadata is not front-end/language specific. A language where most instructions can access any memory is arguable harder to optimize than a language where only a selected set of instructions can do that. But the metadata describing what memory an instruction can access is not frond-end specific.> Your proposal is a new pass, that does some new things and therefore > shouldn't need to be incremental (only when the correct info is > present). > > But that means now we have two loop transformation infrastructures > that could radically change the IR (and the loop metadata, if any).I don't think LLVM's current loop optimizations are well developed. Only LoopVectorize and LoopUnroll are even enabled by default.> Which one passes first will have the advantage, as I think we agreed, > pass ordering is not trivial.Which is one of the things this proposal addresses.> If you restrict this new pass to only doing transformations that are > guaranteed to be valid and better (for some definition of both),This is a strange argument. You want transformation that are invalid and/or worse?> then > you'll end like Polly that does wonders to a very small selection of > loops and nothing at all to most of them, just wasted time looking at > all the possibilities.This is exactly what this proposal is addressing. I think the small selection mostly stems from Polly requiring well-formed IR. Very often it could algorithmically optimize a problem, but cannot represent the IR in its internal representation: a SCoP which is based on ISL's schedule tree representation. The main motivation of the proposal is to exactly address this, meaning there is no external library that restricts what we can represent. A second reason is that Polly relies on ISL's solver algorithm that minimizes re-use distance and parallelism while the pass-based optimizers use hand-written heuristics. I want to make both possible withing the same framework.> If you want to be more aggressive, then the IR will change more often > and the pass ordering problem gets worse, requiring changes in later > passes to cope with the changes.My goal is to get a hierarchical optimization order: Once the higher-level optimizations (that is, loops) have been decided on, only lower-level optimizations (InstCombine, instruction motion, CFG simplification, etc) are left to do. If we have to re-do loop optimizations, something went very wrong. Michael