thr3ads.net - llvm dev - [llvm-dev] Writing loop transformations on the right representation is more productive [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Jan-21 23:53 UTC

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Mi., 15. Jan. 2020 um 03:30 Uhr schrieb Renato Golin <rengolin at
gmail.com>:> > I see it as an advantage in respect of adoption: It can be switched on
and off without affecting other parts.
>
> That's not necessarily true.
>
> If we do like Polly, it is, but then the ability to reuse code is very
> low and the time spent converting across is high. If we want to reuse,
> then we'll invariably add behavioural dependencies and disabling the
> pass may have side-effects.
This applies literally to any pass.

I think the problem of reusability is even worse for the current loop
optimization passes. We have multiple, partially
transformation-specific dependence analyses, such LoopAccessAnalysis,
DependenceInfo, LoopInterchangeLegality, etc. Another one is currently
in the works.

https://xkcd.com/927/ actually does apply here, but I also think that
pass-specific dependence analyses do not scale.

> > I'd put loop optimizations earlier into the pipeline than
vectorization. Where exactly is a phase ordering problem. I'd want to at
least preserve multi-dimensional subscripts. Fortunately MemRef is a core MLIR
construct and unlikely to be lowered before lowering to another representation
(likely LLVM-IR).
>
> Many front-ends do that even before lowering to IR because of the
> richer semantics of the AST, but it's also common for that to
> introduce bugs down the line (don't want to name any proprietary
> front-ends here).
This is a problem for any intermediate representation. But isn't that
also the point of MLIR? To be ably to express higher-level language
concepts in the IR as dialects? This as well might introduce bugs.

One example is the lowering of multi-dimensional arrays from Clang's
AST to LLVM-IR. We can argue whether C/C++ spec would allow
GetElementPtr to be emitted with "inrange" modifier, but for VLAs, we
cannot even express them in the IR, so we had an RFC to change that.

I don't find the argument "there might be bugs" very convincing.

> > I don't see how this is relevant for a Clang-based pipeline. Other
languages likely need a different pipeline than one intended for C/C++ code.
>
> Yes, but we want our passes to work for all languages and be less
> dependent on how well they lower their code.
>
> If they do it well, awesome. If not, and if we can identify patterns
> in LLVM IR then there is no reason not to.
This was relevant to the discussion that /all/ front-ends would have
to generate good-enough annotations for loop transformations. Only the
ones that do might enable loop optimization passes.

Generally, I'd try to to make it easy for other front-end to have loop
optimizations. For instance, avoid isa<LoadInst> in favor of a more
generic "mayReadFromMemory" in analysis/transformation phases.


Michael

Renato Golin via llvm-dev

2020-Jan-22 12:14 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

On Tue, 21 Jan 2020 at 23:54, Michael Kruse <llvmdev at meinersbur.de>
wrote:> Am Mi., 15. Jan. 2020 um 03:30 Uhr schrieb Renato Golin <rengolin at
gmail.com>:
> > > I see it as an advantage in respect of adoption: It can be
switched on and off without affecting other parts.
> >
> > That's not necessarily true.
>
> This applies literally to any pass.
Precisely why I don't think adding more passes is an advantage to adoption.
:)

> I don't find the argument "there might be bugs" very
convincing.
Sorry, it wasn't an argument, just a jest at the expense of some old
commercial front-ends.

Pass ordering is complex no matter how you slice it.

> This was relevant to the discussion that /all/ front-ends would have
> to generate good-enough annotations for loop transformations. Only the
> ones that do might enable loop optimization passes.
This doesn't scale. You end up with a few metadata from a single
front-end justifying a huge new pass that does a few things in even
fewer cases.

VPlan is still crawling, requiring small incremental improvements,
because we're trying to replace an existing pass.

Your proposal is a new pass, that does some new things and therefore
shouldn't need to be incremental (only when the correct info is
present).

But that means now we have two loop transformation infrastructures
that could radically change the IR (and the loop metadata, if any).

Which one passes first will have the advantage, as I think we agreed,
pass ordering is not trivial.

If you restrict this new pass to only doing transformations that are
guaranteed to be valid and better (for some definition of both), then
you'll end like Polly that does wonders to a very small selection of
loops and nothing at all to most of them, just wasted time looking at
all the possibilities.

If you want to be more aggressive, then the IR will change more often
and the pass ordering problem gets worse, requiring changes in later
passes to cope with the changes.

For better or worse, we already have such a pass, the VPlan
infrastructure. It might not be perfect, but it's the path we took
years ago and I'd strongly encourage people to make it better before
throwing it away and coming up with a completely new idea.

VPlan is a simple idea, to represent loops in a more meaningful way,
to represent transformations in a composable way, and to find a
sequence of transforms that would yield the best result.

To me, your proposal is identical in its meaning, but has different
implementation details:
 * Tree instead of VPobjects, with additional benefits for new cases.
 * Heuristics search of transforms wih cheap state copies.

To me, it sounds like we could do both in VPlan, and even call it
LTPlan (Loop Transformation plan) to make it clear it's more generic
than just SIMD vectorisation.

More importantly, it would be a series of incremental steps towards a
better loop infrastructure, building on the work of the last many
years and likely show real benefits much sooner and with a lot less
conflicts than starting from scratch.

--renato

Michael Kruse via llvm-dev

2020-Jan-26 02:05 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Mi., 22. Jan. 2020 um 02:14 Uhr schrieb Renato Golin <rengolin at
gmail.com>:> Precisely why I don't think adding more passes is an advantage to
adoption. :)
The alternative is to have a single pass for each kind of loop
transformations, i.e. many more than a single loop transformation
pass.

> > I don't find the argument "there might be bugs" very
convincing.
>
> Sorry, it wasn't an argument, just a jest at the expense of some old
> commercial front-ends.
>
> Pass ordering is complex no matter how you slice it.
Indeed. I am already concerned of additional phase ordering problems
if we implement each loop transformation in its own pass, e.g. between
loop fusion and loop distribution. Do we first fuse into as few loops
as possible and then distribute, or the other way around?

> > This was relevant to the discussion that /all/ front-ends would have
> > to generate good-enough annotations for loop transformations. Only the
> > ones that do might enable loop optimization passes.
>
> This doesn't scale. You end up with a few metadata from a single
> front-end justifying a huge new pass that does a few things in even
> fewer cases.
I'd think that the metadata is not front-end/language specific.

A language where most instructions can access any memory is arguable
harder to optimize than a language where only a selected set of
instructions can do that. But the metadata describing what memory an
instruction can access is not frond-end specific.

> Your proposal is a new pass, that does some new things and therefore
> shouldn't need to be incremental (only when the correct info is
> present).
>
> But that means now we have two loop transformation infrastructures
> that could radically change the IR (and the loop metadata, if any).
I don't think LLVM's current loop optimizations are well developed.
Only LoopVectorize and LoopUnroll are even enabled by default.

> Which one passes first will have the advantage, as I think we agreed,
> pass ordering is not trivial.
Which is one of the things this proposal addresses.

> If you restrict this new pass to only doing transformations that are
> guaranteed to be valid and better (for some definition of both),
This is a strange argument. You want transformation that are invalid
and/or worse?
> then
> you'll end like Polly that does wonders to a very small selection of
> loops and nothing at all to most of them, just wasted time looking at
> all the possibilities.
This is exactly what this proposal is addressing.

I think the small selection mostly stems from Polly requiring
well-formed IR. Very often it could algorithmically optimize a
problem, but cannot represent the IR in its internal representation: a
SCoP which is based on ISL's schedule tree representation. The main
motivation of the proposal is to exactly address this, meaning there
is no external library that restricts what we can represent.

A second reason is that Polly relies on ISL's solver algorithm that
minimizes re-use distance and parallelism while the pass-based
optimizers use hand-written heuristics. I want to make both possible
withing the same framework.

> If you want to be more aggressive, then the IR will change more often
> and the pass ordering problem gets worse, requiring changes in later
> passes to cope with the changes.
My goal is to get a hierarchical optimization order: Once the
higher-level optimizations (that is, loops) have been decided on, only
lower-level optimizations (InstCombine, instruction motion, CFG
simplification, etc) are left to do. If we have to re-do loop
optimizations, something went very wrong.


Michael

llvm dev - Jan 2020 - Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive