thr3ads.net - llvm dev - [llvm-dev] Writing loop transformations on the right representation is more productive [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Jan-30 09:05 UTC

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Mo., 27. Jan. 2020 um 22:06 Uhr schrieb Uday Kumar Reddy Bondhugula <
uday at polymagelabs.com>:
> Hi Michael,
>
> Although the approach to use a higher order in-memory abstraction like the
> loop tree will make it easier than what you have today, if you used MLIR
> for this representation, you already get a round trippable textual format
> that is *very close* to your form. The affine.for/if, std.for/if in MLIR
> are nearly isomorphic to the tree representation you want, and as such,
> this drastically reduces the delta between the in-memory data structures
> your passes want to operate on and what you see when you print the IR.
> Normally, there'd be resistance to building a textual form /
introducing a
> first class concept in the IR for what you are proposing, but since this
> was already done for MLIR, it looks like it would be a big win from a
> compiler developers' productivity standpoint if you just used MLIR for
this
> loop tree representation. With regard to the FAQ, I can't tell whether
you
> meant something else or missed the representation used in MLIR for the
> affine dialect or in general for "ops with regions".
>
The point of the proposal is not having a first-class construct for loops,
but to allow speculative transformations. Please my response to Chris
Lattner:
https://lists.llvm.org/pipermail/llvm-dev/2020-January/138778.html

> > Q: Relation to MLIR?
> > A: MLIR is more similar to LLVM-IR than a loop hierarchy. For
>
> It's completely the opposite unless you are looking only at MLIR's
std
> dialect! The affine dialect as well as the std.for/if (currently misnamed
> as loop.for/loop.if) are actually a loop tree. The affine ops are just an
> affine loop AST isomorphic to the materialization of polyhedral
> domains/schedules via code generation. Every IST AST or the output of
> polyhedral code generation can be represented in the affine dialect and
> vice versa. MLIR's loop/if ops are a hierarchy rather than a flat form
/
> list of blocks CFG.
>
As per my discussion with Chris Lattner, this is a very subjective
question. It might be controversial, but I don't see MLIR regions as much
more than syntactic sugar for inlined function calls that allow referencing
the outer regions' definitions. This does not mean that I think they are
they are useless, on the contrary.

Regarding the affine dialect, I see the same problem that Polly has when
creating a schedule tree representation: A lot of work has to be done to
make IR originating from Clang compatible. Everything becomes easy if the
front-end can generate an affine dialect out-of-the box.


> > still have to be rediscovered. However, a loop hierarchy optimizer
> > could be applied to MLIR just as well as to LLVM-IR.
>
> This is true, but it's easier to apply it to MLIR because the actual IR
is
> close by miles to the in-memory thing your loop hierarchy optimizer would
> be using. For eg., here's the input IR and the output IR of a simple
outer
> loop vectorization performed in MLIR:
>
>
https://github.com/bondhugula/llvm-project/blob/hop/mlir/test/Transforms/vectorize.mlir#L23
>
>Again, the proposal is about the in-memory representation using red/green
trees (which I firmly disagree with being close to MLIR's in-memory
representation), not the textual format.


Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200130/f95bb574/attachment.html>

Uday Kumar Reddy Bondhugula via llvm-dev

2020-Jan-30 10:39 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Hi Michael,

On Thu, 30 Jan 2020 at 14:36, Michael Kruse <llvmdev at meinersbur.de>
wrote:
> Am Mo., 27. Jan. 2020 um 22:06 Uhr schrieb Uday Kumar Reddy Bondhugula <
> uday at polymagelabs.com>:
>
>> Hi Michael,
>>
>> Although the approach to use a higher order in-memory abstraction like
>> the loop tree will make it easier than what you have today, if you used
>> MLIR for this representation, you already get a round trippable textual
>> format that is *very close* to your form. The affine.for/if, std.for/if
in
>> MLIR are nearly isomorphic to the tree representation you want, and as
>> such, this drastically reduces the delta between the in-memory data
>> structures your passes want to operate on and what you see when you
print
>> the IR. Normally, there'd be resistance to building a textual form
/
>> introducing a first class concept in the IR for what you are proposing,
but
>> since this was already done for MLIR, it looks like it would be a big
win
>> from a compiler developers' productivity standpoint if you just
used MLIR
>> for this loop tree representation. With regard to the FAQ, I can't
tell
>> whether you meant something else or missed the representation used in
MLIR
>> for the affine dialect or in general for "ops with regions".
>>
>
> The point of the proposal is not having a first-class construct for loops,
> but to allow speculative transformations. Please my response to Chris
> Lattner:
>
But that's now how your original post reads or opens with AFAICS.
Presentation aside, even if your central goal was to allow speculative
transformations, the same arguments hold. More below. (I had read your
responses to @clattner).

> https://lists.llvm.org/pipermail/llvm-dev/2020-January/138778.html
>
>
>> > Q: Relation to MLIR?
>> > A: MLIR is more similar to LLVM-IR than a loop hierarchy. For
>>
>> It's completely the opposite unless you are looking only at
MLIR's std
>> dialect! The affine dialect as well as the std.for/if (currently
misnamed
>> as loop.for/loop.if) are actually a loop tree. The affine ops are just
an
>> affine loop AST isomorphic to the materialization of polyhedral
>> domains/schedules via code generation. Every IST AST or the output of
>> polyhedral code generation can be represented in the affine dialect and
>> vice versa. MLIR's loop/if ops are a hierarchy rather than a flat
form /
>> list of blocks CFG.
>>
>
> As per my discussion with Chris Lattner, this is a very subjective
> question. It might be controversial, but I don't see MLIR regions as
much
> more than syntactic sugar for inlined function calls that allow referencing
> the outer regions' definitions. This does not mean that I think they
are
> they are useless, on the contrary.
>
There are multiple ways regions in MLIR can be viewed, but the more
relevant point here is you do have a loop tree structure native in the IR
with MLIR. Regions in MLIR didn't evolve from modeling inlined calls - the
affine.for/affine.if were originally the only two operations in MLIR that
could hold blocks (which in turn are a list of operations as you know) and
there wasn't anything by the name region. Later, "a list of
blocks" was
renamed "region" in order to generalize and unify it with other
concepts
that could be captured with "ops with regions", one of which is
isomorphic
to a "just inlined" call as you view it. But that doesn't mean a
loop tree
doesn't exist as a first class thing in the IR when you have the relevant
ops around -- there is a hierarchy.

>
> Regarding the affine dialect, I see the same problem that Polly has when
> creating a schedule tree representation: A lot of work has to be done to
> make IR originating from Clang compatible. Everything becomes easy if the
> front-end can generate an affine dialect out-of-the box.
>
Right - but for the purposes of your proposal, this isn't really relevant -
for that matter, one could just use the loop.for, loop.if ops if you don't
want to leverage affine restrictions. Moreover, with affine.graybox ops,
you can always use affine.for/if wherever you have structured loops
(otherwise, you would fall back to a flat list of blocks inside the region
of the graybox.) While directly generating the affine dialect maximally
from the frontend / Clang is one option, the other is to just generate
grayboxes with trivial affine.for/if (or just loop.for/if), and then
eliminate the grayboxes maximally within MLIR. This way things are reusable
across different frontends, and it would be similar to Polly's approach
except that you would be dealing with loops/multi-dimensional arrays where
possible instead of flat list of CFGs and GEPs.

>
>
>
>> > still have to be rediscovered. However, a loop hierarchy optimizer
>> > could be applied to MLIR just as well as to LLVM-IR.
>>
>> This is true, but it's easier to apply it to MLIR because the
actual IR
>> is close by miles to the in-memory thing your loop hierarchy optimizer
>> would be using. For eg., here's the input IR and the output IR of a
simple
>> outer loop vectorization performed in MLIR:
>>
>>
https://github.com/bondhugula/llvm-project/blob/hop/mlir/test/Transforms/vectorize.mlir#L23
>>
>>
> Again, the proposal is about the in-memory representation using red/green
> trees
>
That's actually not how I read it. Red/green trees was *one* of the nine
items you mentioned in your list and this didn't come out as the central
idea in your opening paras, but let's go with this now that it's clearer
to
me.

> (which I firmly disagree with being close to MLIR's in-memory
> representation), not the textual format.
>
"close" isn't the right word here, "closer" is! Would
you agree that the
representation you are proposing is closer to MLIR's representation (both
its in-memory and its textual representation) than to LLVM's or is this
proximity not really relevant for the purposes of your proposal? I think
it's important to know which among the two is the more important question.
Note that currently there is really very little difference between MLIR's
textual format for 'for'/if's and the in-memory form its passes use.
The
passes don't create any in-memory higher order representations for these IR
units; they directly update them. There is nothing like the kind of
complementary abstractions you are proposing (for cost models, copy-wise,
etc.).

~ Uday




>
>
> Michael
>
>
-- 
Founder and Director, PolyMage Labs
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200130/a163947a/attachment-0001.html>

Michael Kruse via llvm-dev

2020-Feb-03 06:35 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Do., 30. Jan. 2020 um 04:40 Uhr schrieb Uday Kumar Reddy Bondhugula
<uday at polymagelabs.com>:> There are multiple ways regions in MLIR can be viewed, but the more
relevant point here is you do have a loop tree structure native in the IR with
MLIR. Regions in MLIR didn't evolve from modeling inlined calls - the
affine.for/affine.if were originally the only two operations in MLIR that could
hold blocks (which in turn are a list of operations as you know) and there
wasn't anything by the name region. Later, "a list of blocks" was
renamed "region" in order to generalize and unify it with other
concepts that could be captured with "ops with regions", one of which
is isomorphic to a "just inlined" call as you view it. But that
doesn't mean a loop tree doesn't exist as a first class thing in the IR
when you have the relevant ops around -- there is a hierarchy.
Thanks for the interesting insights into the development history of MLIR.

>> Regarding the affine dialect, I see the same problem that Polly has
when creating a schedule tree representation: A lot of work has to be done to
make IR originating from Clang compatible. Everything becomes easy if the
front-end can generate an affine dialect out-of-the box.
>
> Right - but for the purposes of your proposal, this isn't really
relevant - for that matter, one could just use the loop.for, loop.if ops if you
don't want to leverage affine restrictions. Moreover, with affine.graybox
ops, you can always use affine.for/if wherever you have structured loops
(otherwise, you would fall back to a flat list of blocks inside the region of
the graybox.) While directly generating the affine dialect maximally from the
frontend / Clang is one option, the other is to just generate grayboxes with
trivial affine.for/if (or just loop.for/if), and then eliminate the grayboxes
maximally within MLIR. This way things are reusable across different frontends,
and it would be similar to Polly's approach except that you would be dealing
with loops/multi-dimensional arrays where possible instead of flat list of CFGs
and GEPs.
I think there is a relevant difference of whether we come from a
high-level code generator and then lift restrictions or from a
low-level IR which has to be raised. If starting with the high-level,
we will have to bail out on representing things because we cannot
ensure the expected semantics of the high-level idioms (while-,
do-loops, coroutines, possibly infinite loops, non-returning elements
in the loop body, ...) and have to work towards poking holes into the
high-level representation that existing passes must me able to handle.
When starting with a low-level approach, useful guarantees are added
to the representation, but everything can be represented at the
beginning.
I am not saying that the two approaches cannot meet, but I am afraid
that the high-level approach, like Polly, adds many bail-outs making
it difficult to use in practice. For instance, we want to apply
strip-mining to a loop. Since it does not change the execution order
of any body code, it is always valid, yet we'd have to bail out if we
cannot guarantee the representation's guarantees. I would like to
avoid that.

However, I agree that MLIR has the expressiveness requires for
hierarchical loop structures. We don't think we need to argue about
that.

> That's actually not how I read it. Red/green trees was *one* of the
nine items you mentioned in your list and this didn't come out as the
central idea in your opening paras, but let's go with this now that it's
clearer to me.
Indeed, red/green trees (or DAGs) are one one of the ideas to improve
loop optimizations, but does justify its use by itself. They happen to
be effectively necessary for others in the list (e.g. versioning,
profitiability heuristic by cost function, etc...) and the reason why
I think the same cannot be done with MLIR. In hindsight, I could have
pointed this out more in the original RFC. Note that a hierarchical
representation was not an explicit feature in the list.

To convince me that MLIR is the better IR for loop optimizations,
might show that each of the features enabled by cheap subtree reuse
can also be done sufficiently efficient and easily on MLIR (or that a
feature is not what would actually would want).

>> (which I firmly disagree with being close to MLIR's in-memory
representation), not the textual format.
>
> "close" isn't the right word here, "closer" is!
Would you agree that the representation you are proposing is closer to
MLIR's representation (both its in-memory and its textual representation)
than to LLVM's or is this proximity not really relevant for the purposes of
your proposal? I think it's important to know which among the two is the
more important question.
I think MLIR is an evolution of LLVM-IR and Swift-IR, built around
similar principles such as SSA and Control-Flow Graphs (I understand
that in addition to CFGs, MLIR also enables structured control flow
idioms). SSA is a distinguishing feature: It allows to quickly
traversing def/uses (where compilers without it need a data-flow
analyses), but make subtree reuse hard.

Does this answer your question?

> Note that currently there is really very little difference between
MLIR's textual format for 'for'/if's and the in-memory form its
passes use. The passes don't create any in-memory higher order
representations for these IR units; they directly update them. There is nothing
like the kind of complementary abstractions you are proposing (for cost models,
copy-wise, etc.).
The point I was making is that the in-memory format has references to
related items such as parents and use-def chains. These are implicit
in the textual format, e.g. the parent of an operation is defined by
its syntactical location. When reading into memory, it is not
obligatory for the objects to have all the references to related
objects.

Michael

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Feb 2020 - Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

Possibly Parallel Threads