thr3ads.net - llvm dev - [llvm-dev] Writing loop transformations on the right representation is more productive [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Jan-15 04:39 UTC

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Sa., 11. Jan. 2020 um 07:43 Uhr schrieb Renato Golin <rengolin at
gmail.com>:
> On Sat, 11 Jan 2020 at 00:34, Michael Kruse <llvmdev at
meinersbur.de> wrote:
> > Yes, as mentioned in the Q&A. Unfortunately VPlan is able to
represent
> > arbitrary code not has cheap copies.
>
> Orthogonal, but we should also be looking into implementing the cheap
> copies in VPlan if we want to search for composable plans.
VPlan structures have many references to neighboring structures such as
parents and use-def chains. This makes adding cheap copies as an
afterthought really hard.

> > This conversion is a possibility and certainly not the main motivation
> > for a loop hierarchy.
>
> I know. There are many things that can be done with what you propose,
> but we should focus on what's the main motivation.
>
> From what I can tell, the tree representation is a concrete proposal
> for the many year discussion about parallel IR.
As I recall, the Parallel IR approaches were trying to add parallel
constructs to the existing LLVM-IR. This added the issue that the current
infrastructure suddenly need to handle those as well, becoming a major
problem for adoption.

> The short paper doesn't mention that, nor it discusses other
> opportunities to fix pipeline complexity (that is inherent of any
> compiler).
>
> I still believe that many of the techniques you propose are meaningful
> ways to solve them, but creating another IR will invariably create
> some adoption barriers.
I see it as an advantage in respect of adoption: It can be switched on and
off without affecting other parts.

> > Are you arguing against code versioning? It is already done today by
> > multiple passes such as LoopVersioningLICM, LoopDistribute,
> > LoopUnrollAndJam and LoopVectorize. The proposal explicitly tries to
> > avoid code bloat by having just one fallback copy. Runtime conditions
> > can be chosen more or less optimistically, but I don't see how
this
> > should be an argument for all kinds of versioning.
>
> No. I'm cautious to the combination of heuristics search and
> versioning, especially when the conditions are runtime based. It may
> be hard to CSE them later.
>
> The paths found may not be the most optimal in terms of intermediatestates.

Versioning is always a trade-off between how likely the preconditions apply
and code size (and maybe how expensive the runtime checks are). IMHO this
concern is separate from how code versioning is implemented.

> > > Don't get me wrong, I like the idea, it's a cool
experiment using some
> > > cool data structures and algorithms. But previous experiences
with the
> > > pass manager have, well, not gone smooth in any shape or form.
> >
> > What experiments? I don't see a problem if the pass manger has to
> > invalidate analysis are re-run canonicalization passes. This happens
> > many times in the default pass pipelines. In addition, this
> > invalidation is only necessary if the loop optimization pass optimizes
> > something, in which case the additional cost should be justified.
>
> My point goes back to doing that in VPlan, then tree. The more
> back-and-forth IR transformations we add to the pipeline, the more
> brittle it will be.
Agreed, but IMHO this is the price to pay for better loop optimizations.

> The original email also proposes, for the future, to do all sorts of
> analyses and transformations in the tree representation, and that will
> likely be incompatible with (or at least not propagated through) the
> conversions.
Correct, but I'd argue these are different kinds of analyses not
necessarily even useful for different representations. MLIR also has its
set of analyses separate to those on MLIR.

> > In a previous RFC [8] I tried to NOT introduce a data structure but to
> > re-use LLVM-IR. The only discussion there was about the RFC, was about
> > not to 'abuse' the LLVM-IR.
> >
> > https://lists.llvm.org/pipermail/llvm-dev/2017-October/118169.html
> > https://lists.llvm.org/pipermail/llvm-dev/2017-October/118258.html
> >
> > I definitely see the merits of using fewer data structures, but it is
> > also hard to re-use something existing for a different purpose (in
> > this case: VPlan) without making both more complex.
>
> My point about avoiding more structures and IRs was related to VPlan
> and MLIR, not LLVM-IR.
>
> I agree there should be an abstraction layer to do parallelisation
> analysis, but we already have two, and I'd rather add many of your
> good proposals on those than create a third.
>
> Perhaps it's not clear how we could do that now, but we should at
> least try to weigh the options.
>
> I'd seriously look at adding a tree-like annotation as an MLIR
> dialect, and use it for lean copies.
Like VPlan, MLIR is a representation with many references between objects
from different levels. I do not see how to add cheap copies as an
afterthought.


> > For the foreseeable future, Clang will generate LLVM-IR, but our
> > motivation is to (also) optimize C/C++ code. That is, I do not see a
> > way to not (also) handle LLVM-IR until Clang is changed to generate
> > MLIR (which then again will be another data struture in the system).
>
> Even if/when Clang generates MLIR, there's no guarantee the high-level
> dialects will be preserved until the vectorisation pass.
I'd put loop optimizations earlier into the pipeline than vectorization.
Where exactly is a phase ordering problem. I'd want to at least preserve
multi-dimensional subscripts. Fortunately MemRef is a core MLIR construct
and unlikely to be lowered before lowering to another representation
(likely LLVM-IR).

> And other
> front-ends may not generate the same quality of annotations.
> We may have to re-generate what we need anyway, so no point in waiting
> all the front-ends to do what we need as well as all the previous
> passes to guarantee to keep it.
I don't see how this is relevant for a Clang-based pipeline. Other
languages likely need a different pipeline than one intended for C/C++ code.

There are not a lot of high-level semantics required to be preserved to
build a loop hierarchy.

Thanks for the productive discussion,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/a658d69e/attachment.html>

Renato Golin via llvm-dev

2020-Jan-15 13:29 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

On Wed, 15 Jan 2020 at 04:39, Michael Kruse <llvmdev at meinersbur.de>
wrote:> As I recall, the Parallel IR approaches were trying to add parallel
constructs to the existing LLVM-IR. This added the issue that the current
infrastructure suddenly need to handle those as well, becoming a major problem
for adoption.
Yes, and that's why we could never agree on the one representation. A
completely separate one solves that problem, but introduces another,
itself.

> I see it as an advantage in respect of adoption: It can be switched on and
off without affecting other parts.
That's not necessarily true.

If we do like Polly, it is, but then the ability to reuse code is very
low and the time spent converting across is high. If we want to reuse,
then we'll invariably add behavioural dependencies and disabling the
pass may have side-effects.

> Versioning is always a trade-off between how likely the preconditions apply
and code size (and maybe how expensive the runtime checks are). IMHO this
concern is separate from how code versioning is implemented.
Agreed.

> Agreed, but IMHO this is the price to pay for better loop optimizations.
This may be true, and I can easily accept that, as long as we'll all
aware of the costs of doing so up front.

> I'd put loop optimizations earlier into the pipeline than
vectorization. Where exactly is a phase ordering problem. I'd want to at
least preserve multi-dimensional subscripts. Fortunately MemRef is a core MLIR
construct and unlikely to be lowered before lowering to another representation
(likely LLVM-IR).
Many front-ends do that even before lowering to IR because of the
richer semantics of the AST, but it's also common for that to
introduce bugs down the line (don't want to name any proprietary
front-ends here).

I agree moving high-level optimisation passes up and doing so in a
high-level IR is a good idea.

> I don't see how this is relevant for a Clang-based pipeline. Other
languages likely need a different pipeline than one intended for C/C++ code.
Yes, but we want our passes to work for all languages and be less
dependent on how well they lower their code.

If they do it well, awesome. If not, and if we can identify patterns
in LLVM IR then there is no reason not to.

cheers,
--renato

Michael Kruse via llvm-dev

2020-Jan-21 23:53 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Mi., 15. Jan. 2020 um 03:30 Uhr schrieb Renato Golin <rengolin at
gmail.com>:> > I see it as an advantage in respect of adoption: It can be switched on
and off without affecting other parts.
>
> That's not necessarily true.
>
> If we do like Polly, it is, but then the ability to reuse code is very
> low and the time spent converting across is high. If we want to reuse,
> then we'll invariably add behavioural dependencies and disabling the
> pass may have side-effects.
This applies literally to any pass.

I think the problem of reusability is even worse for the current loop
optimization passes. We have multiple, partially
transformation-specific dependence analyses, such LoopAccessAnalysis,
DependenceInfo, LoopInterchangeLegality, etc. Another one is currently
in the works.

https://xkcd.com/927/ actually does apply here, but I also think that
pass-specific dependence analyses do not scale.

> > I'd put loop optimizations earlier into the pipeline than
vectorization. Where exactly is a phase ordering problem. I'd want to at
least preserve multi-dimensional subscripts. Fortunately MemRef is a core MLIR
construct and unlikely to be lowered before lowering to another representation
(likely LLVM-IR).
>
> Many front-ends do that even before lowering to IR because of the
> richer semantics of the AST, but it's also common for that to
> introduce bugs down the line (don't want to name any proprietary
> front-ends here).
This is a problem for any intermediate representation. But isn't that
also the point of MLIR? To be ably to express higher-level language
concepts in the IR as dialects? This as well might introduce bugs.

One example is the lowering of multi-dimensional arrays from Clang's
AST to LLVM-IR. We can argue whether C/C++ spec would allow
GetElementPtr to be emitted with "inrange" modifier, but for VLAs, we
cannot even express them in the IR, so we had an RFC to change that.

I don't find the argument "there might be bugs" very convincing.

> > I don't see how this is relevant for a Clang-based pipeline. Other
languages likely need a different pipeline than one intended for C/C++ code.
>
> Yes, but we want our passes to work for all languages and be less
> dependent on how well they lower their code.
>
> If they do it well, awesome. If not, and if we can identify patterns
> in LLVM IR then there is no reason not to.
This was relevant to the discussion that /all/ front-ends would have
to generate good-enough annotations for loop transformations. Only the
ones that do might enable loop optimization passes.

Generally, I'd try to to make it easy for other front-end to have loop
optimizations. For instance, avoid isa<LoadInst> in favor of a more
generic "mayReadFromMemory" in analysis/transformation phases.


Michael

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jan 2020 - Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

Reasonably Related Threads