thr3ads.net - llvm dev - [llvm-dev] Writing loop transformations on the right representation is more productive [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Feb-03 06:35 UTC

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Do., 30. Jan. 2020 um 04:40 Uhr schrieb Uday Kumar Reddy Bondhugula
<uday at polymagelabs.com>:> There are multiple ways regions in MLIR can be viewed, but the more
relevant point here is you do have a loop tree structure native in the IR with
MLIR. Regions in MLIR didn't evolve from modeling inlined calls - the
affine.for/affine.if were originally the only two operations in MLIR that could
hold blocks (which in turn are a list of operations as you know) and there
wasn't anything by the name region. Later, "a list of blocks" was
renamed "region" in order to generalize and unify it with other
concepts that could be captured with "ops with regions", one of which
is isomorphic to a "just inlined" call as you view it. But that
doesn't mean a loop tree doesn't exist as a first class thing in the IR
when you have the relevant ops around -- there is a hierarchy.
Thanks for the interesting insights into the development history of MLIR.

>> Regarding the affine dialect, I see the same problem that Polly has
when creating a schedule tree representation: A lot of work has to be done to
make IR originating from Clang compatible. Everything becomes easy if the
front-end can generate an affine dialect out-of-the box.
>
> Right - but for the purposes of your proposal, this isn't really
relevant - for that matter, one could just use the loop.for, loop.if ops if you
don't want to leverage affine restrictions. Moreover, with affine.graybox
ops, you can always use affine.for/if wherever you have structured loops
(otherwise, you would fall back to a flat list of blocks inside the region of
the graybox.) While directly generating the affine dialect maximally from the
frontend / Clang is one option, the other is to just generate grayboxes with
trivial affine.for/if (or just loop.for/if), and then eliminate the grayboxes
maximally within MLIR. This way things are reusable across different frontends,
and it would be similar to Polly's approach except that you would be dealing
with loops/multi-dimensional arrays where possible instead of flat list of CFGs
and GEPs.
I think there is a relevant difference of whether we come from a
high-level code generator and then lift restrictions or from a
low-level IR which has to be raised. If starting with the high-level,
we will have to bail out on representing things because we cannot
ensure the expected semantics of the high-level idioms (while-,
do-loops, coroutines, possibly infinite loops, non-returning elements
in the loop body, ...) and have to work towards poking holes into the
high-level representation that existing passes must me able to handle.
When starting with a low-level approach, useful guarantees are added
to the representation, but everything can be represented at the
beginning.
I am not saying that the two approaches cannot meet, but I am afraid
that the high-level approach, like Polly, adds many bail-outs making
it difficult to use in practice. For instance, we want to apply
strip-mining to a loop. Since it does not change the execution order
of any body code, it is always valid, yet we'd have to bail out if we
cannot guarantee the representation's guarantees. I would like to
avoid that.

However, I agree that MLIR has the expressiveness requires for
hierarchical loop structures. We don't think we need to argue about
that.

> That's actually not how I read it. Red/green trees was *one* of the
nine items you mentioned in your list and this didn't come out as the
central idea in your opening paras, but let's go with this now that it's
clearer to me.
Indeed, red/green trees (or DAGs) are one one of the ideas to improve
loop optimizations, but does justify its use by itself. They happen to
be effectively necessary for others in the list (e.g. versioning,
profitiability heuristic by cost function, etc...) and the reason why
I think the same cannot be done with MLIR. In hindsight, I could have
pointed this out more in the original RFC. Note that a hierarchical
representation was not an explicit feature in the list.

To convince me that MLIR is the better IR for loop optimizations,
might show that each of the features enabled by cheap subtree reuse
can also be done sufficiently efficient and easily on MLIR (or that a
feature is not what would actually would want).

>> (which I firmly disagree with being close to MLIR's in-memory
representation), not the textual format.
>
> "close" isn't the right word here, "closer" is!
Would you agree that the representation you are proposing is closer to
MLIR's representation (both its in-memory and its textual representation)
than to LLVM's or is this proximity not really relevant for the purposes of
your proposal? I think it's important to know which among the two is the
more important question.
I think MLIR is an evolution of LLVM-IR and Swift-IR, built around
similar principles such as SSA and Control-Flow Graphs (I understand
that in addition to CFGs, MLIR also enables structured control flow
idioms). SSA is a distinguishing feature: It allows to quickly
traversing def/uses (where compilers without it need a data-flow
analyses), but make subtree reuse hard.

Does this answer your question?

> Note that currently there is really very little difference between
MLIR's textual format for 'for'/if's and the in-memory form its
passes use. The passes don't create any in-memory higher order
representations for these IR units; they directly update them. There is nothing
like the kind of complementary abstractions you are proposing (for cost models,
copy-wise, etc.).
The point I was making is that the in-memory format has references to
related items such as parents and use-def chains. These are implicit
in the textual format, e.g. the parent of an operation is defined by
its syntactical location. When reading into memory, it is not
obligatory for the objects to have all the references to related
objects.

Michael

Chris Lattner via llvm-dev

2020-Feb-06 00:13 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

> On Feb 2, 2020, at 10:35 PM, Michael Kruse via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> That's actually not how I read it. Red/green trees was *one* of the
nine items you mentioned in your list and this didn't come out as the
central idea in your opening paras, but let's go with this now that it's
clearer to me.
> 
> Indeed, red/green trees (or DAGs) are one one of the ideas to improve
> loop optimizations, but does justify its use by itself. They happen to
> be effectively necessary for others in the list (e.g. versioning,
> profitiability heuristic by cost function, etc...) and the reason why
> I think the same cannot be done with MLIR. In hindsight, I could have
> pointed this out more in the original RFC. Note that a hierarchical
> representation was not an explicit feature in the list.
> 
> To convince me that MLIR is the better IR for loop optimizations,
> might show that each of the features enabled by cheap subtree reuse
> can also be done sufficiently efficient and easily on MLIR (or that a
> feature is not what would actually would want).
Hi Michael,

If I understand your claims, you are claiming both that red/green trees are
essential for a loop optimizer, and that this essential nature “justifies” the
cost of reinventing an entire compiler infrastructure is lower than the benefit
of using (e.g.) MLIR to do this.  I haven’t seen evidence of either point: lots
of loop optimizers exist that don’t use red/green trees.

Furthermore, my experience is that specialized IRs never get the investment in
(e.g.). testing, location propagation for debugging optimized code, textual
round tripping, pass management, and the multitude of other things that are
required for a world class compiler implementation.

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200205/f92b6f61/attachment-0001.html>

Hal Finkel via llvm-dev

2020-Feb-06 00:51 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

On 2/5/20 6:13 PM, Chris Lattner via llvm-dev wrote:>
>
>> On Feb 2, 2020, at 10:35 PM, Michael Kruse via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>>
>>> That's actually not how I read it. Red/green trees was *one* of
the
>>> nine items you mentioned in your list and this didn't come out
as
>>> the central idea in your opening paras, but let's go with this
now
>>> that it's clearer to me.
>>
>> Indeed, red/green trees (or DAGs) are one one of the ideas to improve
>> loop optimizations, but does justify its use by itself. They happen to
>> be effectively necessary for others in the list (e.g. versioning,
>> profitiability heuristic by cost function, etc...) and the reason why
>> I think the same cannot be done with MLIR. In hindsight, I could have
>> pointed this out more in the original RFC. Note that a hierarchical
>> representation was not an explicit feature in the list.
>>
>> To convince me that MLIR is the better IR for loop optimizations,
>> might show that each of the features enabled by cheap subtree reuse
>> can also be done sufficiently efficient and easily on MLIR (or that a
>> feature is not what would actually would want).
>
> Hi Michael,
>
> If I understand your claims, you are claiming both that red/green 
> trees are essential for a loop optimizer, and that this essential 
> nature “justifies” the cost of reinventing an entire compiler 
> infrastructure is lower than the benefit of using (e.g.) MLIR to do 
> this.  I haven’t seen evidence of either point:

Michael and I have discussed this offilne, and I think that more 
quantitative information here would be helpful. One phrasing might be: 
For a reasonable test suite of programs, for the loops we might 
optimize, how many different loop-nest variants do we need to represent 
for costing purposes (also note that, depending on profiling information 
and/or heuristics, we may need to cost at multiple trip-count values)? 
Given that, what is the relative cost of creating deep copies of the 
loop nests and transforming them vs. the cost of using a red/green tree? 
Does that cost seem significant?

We don't want to invest significant effort into a infrastructure that we 
know ahead of time won't scale sufficiently.

> lots of loop optimizers exist that don’t use red/green trees.

I've worked with many compilers, some from vendors known for having 
strong loop optimizers, and none of them would meet our goals for 
performance, robustness, and modeling accuracy (e.g., not regressing 
performance in many cases). Our goal here is to significantly improve on 
the state of the art.

>
> Furthermore, my experience is that specialized IRs never get the 
> investment in (e.g.). testing, location propagation for debugging 
> optimized code, textual round tripping, pass management, and the 
> multitude of other things that are required for a world class compiler 
> implementation.

I agree, but I think that these are well-known requirements within this 
ecosystem. What this means also depends on context (it might be more 
helpful to think of this more like SCEV than like a custom IR).

  -Hal

>
> -Chris
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200205/8da9bd11/attachment.html>

Michael Kruse via llvm-dev

2020-Feb-06 04:22 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Mi., 5. Feb. 2020 um 18:13 Uhr schrieb Chris Lattner <clattner at
nondot.org>:> If I understand your claims, you are claiming both that red/green trees are
essential for a loop optimizer, and that this essential nature “justifies” the
cost of reinventing an entire compiler infrastructure is lower than the benefit
of using (e.g.) MLIR to do this.  I haven’t seen evidence of either point: lots
of loop optimizers exist that don’t use red/green trees.
I have spoken with multiple compiler vendors and we all could agree
that the approach of pass/transformation-local decision making is
limiting the global optimization goal the more transformations there
are.
seriously limiting the quality of global optimizations. Of those who
gave me some details, have a loop-specific data structure (although
not "red/green" trees). I don't suggest to replace anything in the
existing toolchain/pipeline, so there is no risk of regressing
anything that already works today. Every advancement will need to show
its advantages at one point, this was already the case with e.g. SSA
form.

> Furthermore, my experience is that specialized IRs never get the investment
in (e.g.). testing, location propagation for debugging optimized code, textual
round tripping, pass management, and the multitude of other things that are
required for a world class compiler implementation.
With writing the RFC I also try to win contributors about the
usefulness of the idea and I think it is feasible. For example, Polly
started as a research project, attracting contributors, but did not
succeed to make it a main component of LLVM because of other reasons.
I think if LLVM also want's a world class loop optimizer, some time
needs to be invested.

The discussion here is valuable for me, helping me to make my
presentation about it at EuroLLVM as relevant as possible. My current
idea is to take a complex loop nest, and compare optimizing it using
red/green DAGs and traditional pass-based optimizers.

Michael

Uday Kumar Reddy Bondhugula via llvm-dev

2020-Feb-08 18:22 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

On Mon, 3 Feb 2020 at 12:06, Michael Kruse <llvmdev at meinersbur.de>
wrote:
> Am Do., 30. Jan. 2020 um 04:40 Uhr schrieb Uday Kumar Reddy Bondhugula
> <uday at polymagelabs.com>:
> > There are multiple ways regions in MLIR can be viewed, but the more
> relevant point here is you do have a loop tree structure native in the IR
> with MLIR. Regions in MLIR didn't evolve from modeling inlined calls -
the
> affine.for/affine.if were originally the only two operations in MLIR that
> could hold blocks (which in turn are a list of operations as you know) and
> there wasn't anything by the name region. Later, "a list of
blocks" was
> renamed "region" in order to generalize and unify it with other
concepts
> that could be captured with "ops with regions", one of which is
isomorphic
> to a "just inlined" call as you view it. But that doesn't
mean a loop tree
> doesn't exist as a first class thing in the IR when you have the
relevant
> ops around -- there is a hierarchy.
>
> Thanks for the interesting insights into the development history of MLIR.
>
>
> >> Regarding the affine dialect, I see the same problem that Polly
has
> when creating a schedule tree representation: A lot of work has to be done
> to make IR originating from Clang compatible. Everything becomes easy if
> the front-end can generate an affine dialect out-of-the box.
> >
> > Right - but for the purposes of your proposal, this isn't really
> relevant - for that matter, one could just use the loop.for, loop.if ops if
> you don't want to leverage affine restrictions. Moreover, with
> affine.graybox ops, you can always use affine.for/if wherever you have
> structured loops (otherwise, you would fall back to a flat list of blocks
> inside the region of the graybox.) While directly generating the affine
> dialect maximally from the frontend / Clang is one option, the other is to
> just generate grayboxes with trivial affine.for/if (or just loop.for/if),
> and then eliminate the grayboxes maximally within MLIR. This way things are
> reusable across different frontends, and it would be similar to Polly's
> approach except that you would be dealing with loops/multi-dimensional
> arrays where possible instead of flat list of CFGs and GEPs.
>
> I think there is a relevant difference of whether we come from a
> high-level code generator and then lift restrictions or from a
> low-level IR which has to be raised. If starting with the high-level,
> we will have to bail out on representing things because we cannot
> ensure the expected semantics of the high-level idioms (while-,
> do-loops, coroutines, possibly infinite loops, non-returning elements
> in the loop body, ...) and have to work towards poking holes into the
> high-level representation that existing passes must me able to handle.
> When starting with a low-level approach, useful guarantees are added
> to the representation, but everything can be represented at the
> beginning.
> I am not saying that the two approaches cannot meet, but I am afraid
> that the high-level approach, like Polly, adds many bail-outs making
> it difficult to use in practice. For instance, we want to apply
> strip-mining to a loop. Since it does not change the execution order
> of any body code, it is always valid, yet we'd have to bail out if we
> cannot guarantee the representation's guarantees. I would like to
> avoid that.
>
> However, I agree that MLIR has the expressiveness requires for
> hierarchical loop structures. We don't think we need to argue about
> that.
>
>
> > That's actually not how I read it. Red/green trees was *one* of
the nine
> items you mentioned in your list and this didn't come out as the
central
> idea in your opening paras, but let's go with this now that it's
clearer to
> me.
>
> Indeed, red/green trees (or DAGs) are one one of the ideas to improve
> loop optimizations, but does justify its use by itself. They happen to
> be effectively necessary for others in the list (e.g. versioning,
> profitiability heuristic by cost function, etc...) and the reason why
> I think the same cannot be done with MLIR. In hindsight, I could have
> pointed this out more in the original RFC. Note that a hierarchical
> representation was not an explicit feature in the list.
>
> To convince me that MLIR is the better IR for loop optimizations,
> might show that each of the features enabled by cheap subtree reuse
> can also be done sufficiently efficient and easily on MLIR (or that a
> feature is not what would actually would want).
>
I suspect that you are looking at this as "MLIR vs your proposal", but
the
point I've been trying to make is to compare "your proposal on
MLIR" vs
"your proposal on LLVM", i.e., you may want to evaluate the
cost/benefit of
hosting your approach on MLIR vis-a-vis LLVM while accounting for the fact
that you'll need the conversion for LLVM to MLIR and back for the relevant
IR units with the former. On this note, in my opinion, MLIR doesn't have a
unified/clear strategy for loop optimization yet (even op representation
wise); there are different ops/representations that are being pedalled by
different folks, and some I believe are redundant or are simply duplicating
infrastructure in less powerful ways. I'm saying this because if you choose
MLIR to host your stuff, you'll have choices to make on which forms to use.

>
> >> (which I firmly disagree with being close to MLIR's in-memory
> representation), not the textual format.
> >
> > "close" isn't the right word here, "closer"
is! Would you agree that the
> representation you are proposing is closer to MLIR's representation
(both
> its in-memory and its textual representation) than to LLVM's or is this
> proximity not really relevant for the purposes of your proposal? I think
> it's important to know which among the two is the more important
question.
>
> I think MLIR is an evolution of LLVM-IR and Swift-IR, built around
> similar principles such as SSA and Control-Flow Graphs (I understand
> that in addition to CFGs, MLIR also enables structured control flow
> idioms). SSA is a distinguishing feature: It allows to quickly
> traversing def/uses (where compilers without it need a data-flow
> analyses), but make subtree reuse hard.
>
> Does this answer your question?
>
Not directly, but I think I have more information now. You are probably
saying it's nearly as easy/difficult to host your approach on LLVM IR as it
is on MLIR.

All the best!

~ Uday



>
>
> > Note that currently there is really very little difference between
> MLIR's textual format for 'for'/if's and the in-memory form
its passes use.
> The passes don't create any in-memory higher order representations for
> these IR units; they directly update them. There is nothing like the kind
> of complementary abstractions you are proposing (for cost models,
> copy-wise, etc.).
>
> The point I was making is that the in-memory format has references to
> related items such as parents and use-def chains. These are implicit
> in the textual format, e.g. the parent of an operation is defined by
> its syntactical location. When reading into memory, it is not
> obligatory for the objects to have all the references to related
> objects.
>
>
> Michael
>

-- 
Founder and Director, PolyMage Labs
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200208/67f62249/attachment.html>

Michael Kruse via llvm-dev

2020-Feb-10 18:02 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Sa., 8. Feb. 2020 um 12:22 Uhr schrieb Uday Kumar Reddy Bondhugula
<uday at polymagelabs.com>:>> To convince me that MLIR is the better IR for loop optimizations,
>> might show that each of the features enabled by cheap subtree reuse
>> can also be done sufficiently efficient and easily on MLIR (or that a
>> feature is not what would actually would want).
>
>
> I suspect that you are looking at this as "MLIR vs your
proposal", but the point I've been trying to make is to compare
"your proposal on MLIR" vs "your proposal on LLVM", i.e.,
you may want to evaluate the cost/benefit of hosting your approach on MLIR
vis-a-vis LLVM while accounting for the fact that you'll need the conversion
for LLVM to MLIR and back for the relevant IR units with the former. On this
note, in my opinion, MLIR doesn't have a unified/clear strategy for loop
optimization yet (even op representation wise); there are different
ops/representations that are being pedalled by different folks, and some I
believe are redundant or are simply duplicating infrastructure in less powerful
ways. I'm saying this because if you choose MLIR to host your stuff,
you'll have choices to make on which forms to use.
I was indeed interpreting your arguments as "MLIR vs your proposal".
Is mentioned in the original RFC, I would also like to make "proposal
on MLIR" work (mostly for flang), but "proposal on LLVM" for new
seems
to offer the greater benefit. MLIR has the advantage that it can carry
more semantics from the front-end that can be used (multi-dimensional
array subscripts, already parallel loops, ...).


>> >> (which I firmly disagree with being close to MLIR's
in-memory representation), not the textual format.
>> >
>> > "close" isn't the right word here,
"closer" is! Would you agree that the representation you are proposing
is closer to MLIR's representation (both its in-memory and its textual
representation) than to LLVM's or is this proximity not really relevant for
the purposes of your proposal? I think it's important to know which among
the two is the more important question.
>>
>> I think MLIR is an evolution of LLVM-IR and Swift-IR, built around
>> similar principles such as SSA and Control-Flow Graphs (I understand
>> that in addition to CFGs, MLIR also enables structured control flow
>> idioms). SSA is a distinguishing feature: It allows to quickly
>> traversing def/uses (where compilers without it need a data-flow
>> analyses), but make subtree reuse hard.
>>
>> Does this answer your question?
>
>
> Not directly, but I think I have more information now. You are probably
saying it's nearly as easy/difficult to host your approach on LLVM IR as it
is on MLIR.
Yes.

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Feb 2020 - Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

Seemingly Similar Threads