thr3ads.net - llvm dev - [llvm-dev] Writing loop transformations on the right representation is more productive [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2020-Feb-06 00:51 UTC

[llvm-dev] Writing loop transformations on the right representation is more productive

On 2/5/20 6:13 PM, Chris Lattner via llvm-dev wrote:>
>
>> On Feb 2, 2020, at 10:35 PM, Michael Kruse via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>>
>>> That's actually not how I read it. Red/green trees was *one* of
the
>>> nine items you mentioned in your list and this didn't come out
as
>>> the central idea in your opening paras, but let's go with this
now
>>> that it's clearer to me.
>>
>> Indeed, red/green trees (or DAGs) are one one of the ideas to improve
>> loop optimizations, but does justify its use by itself. They happen to
>> be effectively necessary for others in the list (e.g. versioning,
>> profitiability heuristic by cost function, etc...) and the reason why
>> I think the same cannot be done with MLIR. In hindsight, I could have
>> pointed this out more in the original RFC. Note that a hierarchical
>> representation was not an explicit feature in the list.
>>
>> To convince me that MLIR is the better IR for loop optimizations,
>> might show that each of the features enabled by cheap subtree reuse
>> can also be done sufficiently efficient and easily on MLIR (or that a
>> feature is not what would actually would want).
>
> Hi Michael,
>
> If I understand your claims, you are claiming both that red/green 
> trees are essential for a loop optimizer, and that this essential 
> nature “justifies” the cost of reinventing an entire compiler 
> infrastructure is lower than the benefit of using (e.g.) MLIR to do 
> this.  I haven’t seen evidence of either point:

Michael and I have discussed this offilne, and I think that more 
quantitative information here would be helpful. One phrasing might be: 
For a reasonable test suite of programs, for the loops we might 
optimize, how many different loop-nest variants do we need to represent 
for costing purposes (also note that, depending on profiling information 
and/or heuristics, we may need to cost at multiple trip-count values)? 
Given that, what is the relative cost of creating deep copies of the 
loop nests and transforming them vs. the cost of using a red/green tree? 
Does that cost seem significant?

We don't want to invest significant effort into a infrastructure that we 
know ahead of time won't scale sufficiently.

> lots of loop optimizers exist that don’t use red/green trees.

I've worked with many compilers, some from vendors known for having 
strong loop optimizers, and none of them would meet our goals for 
performance, robustness, and modeling accuracy (e.g., not regressing 
performance in many cases). Our goal here is to significantly improve on 
the state of the art.

>
> Furthermore, my experience is that specialized IRs never get the 
> investment in (e.g.). testing, location propagation for debugging 
> optimized code, textual round tripping, pass management, and the 
> multitude of other things that are required for a world class compiler 
> implementation.

I agree, but I think that these are well-known requirements within this 
ecosystem. What this means also depends on context (it might be more 
helpful to think of this more like SCEV than like a custom IR).

  -Hal

>
> -Chris
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200205/8da9bd11/attachment.html>

Chris Lattner via llvm-dev

2020-Feb-07 19:06 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Hi Hal,

Just a disclaimer, I’m rearranging your comments below, I hope you don’t mind:
>> lots of loop optimizers exist that don’t use red/green trees.
> I've worked with many compilers, some from vendors known for having
strong loop optimizers, and none of them would meet our goals for performance,
robustness, and modeling accuracy (e.g., not regressing performance in many
cases). Our goal here is to significantly improve on the state of the art.
> Agreed, that is the right goal, no disagreement on that!  I also agree that
phase ordering is a very difficult problem to solve, and that red/green trees
are one interesting approach to help with them.

>> Furthermore, my experience is that specialized IRs never get the
investment in (e.g.). testing, location propagation for debugging optimized
code, textual round tripping, pass management, and the multitude of other things
that are required for a world class compiler implementation.
> I agree, but I think that these are well-known requirements within this
ecosystem. What this means also depends on context (it might be more helpful to
think of this more like SCEV than like a custom IR).
> 
SCEV is a great example of the problem that I’m talking about here.  SCEV cannot
be easily tested, because the IR doesn’t round trip to a file-check-able textual
representation, and cannot have passes applied to it.  This is barely acceptable
for SCEV, because there are workarounds for the simple cases, but would be
extremely problematic for something as critical as the IR for a loop optimizer
as a whole.  SCEV also doesn’t propagate location information, and has several
other problems.

The IR for a loop framework is going to require other duplications and have
other problems, e.g. you’ll want a pass manager etc.
> On Feb 5, 2020, at 4:51 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> If I understand your claims, you are claiming both that red/green trees
are essential for a loop optimizer, and that this essential nature “justifies”
the cost of reinventing an entire compiler infrastructure is lower than the
benefit of using (e.g.) MLIR to do this.  I haven’t seen evidence of either
point:
> Michael and I have discussed this offilne, and I think that more
quantitative information here would be helpful. One phrasing might be: For a
reasonable test suite of programs, for the loops we might optimize, how many
different loop-nest variants do we need to represent for costing purposes (also
note that, depending on profiling information and/or heuristics, we may need to
cost at multiple trip-count values)? Given that, what is the relative cost of
creating deep copies of the loop nests and transforming them vs. the cost of
using a red/green tree? Does that cost seem significant?
> 
> We don't want to invest significant effort into a infrastructure that
we know ahead of time won't scale sufficiently.Again, as I mentioned above, I completely agree with your goals.  That said, I
have a few concerns about this:

1) My understanding is that this approach is more of a research project that
needs to be proven, than an implemented design the entire LLVM community should
back as “the plan".  If you’d like to pursue this, then I think the right
way to go is to do some significant implementation work to build it out, see how
it works, get experience with it, then propose inclusion as “the” loop optimizer
when it can be measured empirically.  This is effectively what Polly did.

2) On MLIR, in addition to being a nice technical solution to the IR data
structures and infrastructure backing it, a large slice of the MLIR community
has exactly the same goals as you propose.  They are literally building loop
transformation technology driving state of the art forward, and they also have
phase ordering issues :-).  These folks have lots of advanced thoughts on this,
and a lot of expertise in vectorization, memory hierarchy optimizations,
accelerators,  HPC, and low-level code generation etc.  Leveraging this work
makes sense to me, and duplicating it seems wasteful.

3) On the technical point design of red/green trees, my personal opinion is that
this isn’t solving the important part of the problem.  Red/green trees (as far
as I’m aware
<https://llvm.org/devmtg/2018-10/slides/Kruse-LoopTransforms.pdf>) are a
memory saving optimization.  However, “solving" the phase ordering issues
require exploring an exponential (e.g. in depth of transformations / pass
pipeline) search space problem (which is expensive in cycles, not just memory),
and it relies on having a very good cost model for the generated code (which is
a universally hard problem).  Furthermore, on the cost model point, the
performance of the code depends on the later “non loop optimizations” as well,
which are not going to get changed to use this new IR.

4) While red/green trees are interesting, there are other approaches to tackle
these problems.  However, they are also researchy and need to be proven.  I’m
happy to explain my thoughts on this if you are interested.

Overall, I am super enthusiastic about new research directions, but I think they
should be done along the lines of Polly, where the “new thing” gets built and
can then be quantitatively evaluated based on its merits in practice, rather
than being an up-front decision based on ideas.  I am also a bit concerned that
this effort is highly duplicative of work already happening in the MLIR
community, but duplication isn’t necessarily bad when the path forward isn’t
perfectly clear.  In either case, I’d love to see more cross pollination between
the efforts!

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200207/63ea146d/attachment.html>

Michael Kruse via llvm-dev

2020-Feb-08 06:10 UTC

head link

[llvm-dev] Writing loop transformations on the right representation is more productive

Am Fr., 7. Feb. 2020 um 13:06 Uhr schrieb Chris Lattner <clattner at
nondot.org>:
>
> Furthermore, my experience is that specialized IRs never get the
> investment in (e.g.). testing, location propagation for debugging optimized
> code, textual round tripping, pass management, and the multitude of other
> things that are required for a world class compiler implementation.
>
> I agree, but I think that these are well-known requirements within this
> ecosystem. What this means also depends on context (it might be more
> helpful to think of this more like SCEV than like a custom IR).
>
> SCEV is a great example of the problem that I’m talking about here.  SCEV
> cannot be easily tested, because the IR doesn’t round trip to a
> file-check-able textual representation, and cannot have passes applied to
> it.  This is barely acceptable for SCEV, because there are workarounds for
> the simple cases, but would be extremely problematic for something as
> critical as the IR for a loop optimizer as a whole.  SCEV also doesn’t
> propagate location information, and has several other problems.
>
A textual representation is not the highest in my priority list, but
certainly doable. I'd wait until the data structure settles to something
sufficiently stable. For instance, does the textual representation
encompass the entire version history or just one snapshot (i.e. just the
red tree?). Until then, I think it is sufficient when there is a
predictable path from IR to red/green tree.


The IR for a loop framework is going to require other duplications and
have> other problems, e.g. you’ll want a pass manager etc.
>
I am not sure that a pass manager is that necessary. It adds flexibility to
the optimization process, but as long we don't support dynamically adding
transformations, I think organizing the transformations in code is
sufficient at the beginning. Since transformations apply on subtrees, are
prioritized, and can add candidates to be selected by a cost function, a
pass manager has to be structured differently.


Michael and I have discussed this offilne, and I think that
more> quantitative information here would be helpful. One phrasing might be: For
> a reasonable test suite of programs, for the loops we might optimize, how
> many different loop-nest variants do we need to represent for costing
> purposes (also note that, depending on profiling information and/or
> heuristics, we may need to cost at multiple trip-count values)? Given that,
> what is the relative cost of creating deep copies of the loop nests and
> transforming them vs. the cost of using a red/green tree? Does that cost
> seem significant?
>
> We don't want to invest significant effort into a infrastructure that
we
> know ahead of time won't scale sufficiently.
>
> Again, as I mentioned above, I completely agree with your goals.  That
> said, I have a few concerns about this:
>
> 1) My understanding is that this approach is more of a research project
> that needs to be proven, than an implemented design the entire LLVM
> community should back as “the plan".  If you’d like to pursue this,
then I
> think the right way to go is to do some significant implementation work to
> build it out, see how it works, get experience with it, then propose
> inclusion as “the” loop optimizer when it can be measured empirically.
> This is effectively what Polly did.
>
100% agree

>
> 2) On MLIR, in addition to being a nice technical solution to the IR data
> structures and infrastructure backing it, a large slice of the MLIR
> community has exactly the same goals as you propose.  They are literally
> building loop transformation technology driving state of the art forward,
> and they also have phase ordering issues :-).  These folks have lots of
> advanced thoughts on this, and a lot of expertise in vectorization, memory
> hierarchy optimizations, accelerators,  HPC, and low-level code generation
> etc.  Leveraging this work makes sense to me, and duplicating it seems
> wasteful.
>
Agreed, although I think there is some difference the level we operate on.
My focus was on optimizing code coming from relatively low-level C/C++
ensuring that we do not loose opportunities because the high-level
representation does not exactly match the low-level IR. This does not mean
we cannot learn from each other's experience.


3) On the technical point design of red/green trees, my personal opinion
is> that this isn’t solving the important part of the problem.  Red/green trees
> (as far as I’m aware
> <https://llvm.org/devmtg/2018-10/slides/Kruse-LoopTransforms.pdf>)
are a
> memory saving optimization.
>
Not entirely, there would also be a run-time cost if we had to copy the
complete IR before for creating a new "red" representation. If this
was
necessary, I think it would be a barrier to speculatively apply
transformations.


>  However, “solving" the phase ordering issues require exploring an
> exponential (e.g. in depth of transformations / pass pipeline) search space
> problem (which is expensive in cycles, not just memory), and it relies on
> having a very good cost model for the generated code (which is a
> universally hard problem).  Furthermore, on the cost model point, the
> performance of the code depends on the later “non loop optimizations” as
> well, which are not going to get changed to use this new IR.
>
Very good point. I don't claim a red/green tree is the universal solution
to this problem, but I think it makes problem space exploration
significantly easier, especially when combining multiple transformations
before knowing the outcome.

As you point out, "non-loop optimizations" will be necessary after
loop
optimizations as well (same with LLVM's current loop optimizations: for
instance, undo LCSAA) such as new InstCombine opportunities after loop
fusion. This can be difficult to predict by a cost function, but they are
usually less significant than what can be gained by a loop transformations.
Others, such as LICM, can be significant though and should be considered by
a cost function (or simply applied to the red/green tree before evaluating
the cost function to be automatically be considered).

This is why I am enthusiastic about speculative transformations -- a
generic cost function can handle many cases without requiring code for each
transformation.


> 4) While red/green trees are interesting, there are other approaches to
> tackle these problems.  However, they are also researchy and need to be
> proven.  I’m happy to explain my thoughts on this if you are interested.
>
>
> Overall, I am super enthusiastic about new research directions, but I
> think they should be done along the lines of Polly, where the “new thing”
> gets built and can then be quantitatively evaluated based on its merits in
> practice, rather than being an up-front decision based on ideas.  I am also
> a bit concerned that this effort is highly duplicative of work already
> happening in the MLIR community, but duplication isn’t necessarily bad when
> the path forward isn’t perfectly clear.  In either case, I’d love to see
> more cross pollination between the efforts!
>
One of the motivations of writing the RFC is to clarify goals and
requirements. I would like to avoid the situation we had with Polly to
discover 'blockers' when mainstreaming.

Another motivation is to find more interested parties.


Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200208/14d732b9/attachment-0001.html>

llvm dev - Feb 2020 - Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive

[llvm-dev] Writing loop transformations on the right representation is more productive