thr3ads.net - llvm dev - [llvm-dev] [RFC] Polly Status and Integration [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Sep-22 05:09 UTC

[llvm-dev] [RFC] Polly Status and Integration

Hi, Johannes,

Thanks for writing this. I certainly think you have the right idea in 
terms of the desired end state and modular design.

On 09/19/2017 07:33 PM, Johannes Doerfert wrote:> Hi Hal, Tobias, Michael, and others,
>
> I'd like to add my view (and a proposal) to this discussion and I
> apologize directly for doing this so late*. I also want to apologize
> because this email is long, contains various technical details and also
> argumentations that might need more justification. However, I am happy
> to provide further information (and/or examples) to explain my views if
> required.
>
> tldr;
> We should introduce polyhedral analysis and optimization capabilities
> into LLVM step by step. Along the way we should revisit design decisions
> made by Polly and adjust them according to the use cases in the rest of
> LLVM. Finally, we should keep Polly as a stand-alone tool to allow
> research into, and prototyping of, complex loop transformations.
>
> * There was a paper deadline end of last week and I simply had to
prioritize.
>
>
-------------------------------------------------------------------------------
>
> LLVM performs "simple" loop transformations and targets almost
> exclusively inner-most loops. Polly offers a pipeline to perform
> "complex" polyhedral-based loop optimizations on a sequence of
loop
> nests. It seems natural to integrate Polly into the LLVM pipeline as it
> apparently complements it well. However, I do not believe that it is
> wise to integrate Polly into LLVM for a variety of reasons, some of
> which I will explain in more detail now. Afterwards I will propose a
> different way to achieve (most of) the things Hal mentioned in his mail
> in an incremental way. Nevertheless, I want to stress that I am still an
> advocate of Polly and I honestly believe it to be a valuable part of the
> LLVM environment. Though I strongly feel it should be continued as a
> research tool and test bed to allow research into, and prototyping of,
> (polyhedral-model-based) analysis and optimizations.
While this may end up being the conclusion, I'd find that splitting 
unfortunate. One of the strengths of LLVM is that, within the framework 
of a production-quality compiler, we enable new compiler research. This 
strengthens research not only by allowing new ideas to be surrounded by 
existing ones in a realistic way, allowing the effects of the new ideas 
to be isolated convincingly, but also encourages new ideas to be 
developed and tested in a way that mirrors how they might be deployed in 
practice. This also encourages the research community to be part of the 
LLVM community, and I believe that's a positive for the community as a 
whole. The more that we isolate the "research tool" from the rest of
the
ecosystem, the less value the research tool has, and the less the 
community as a whole benefits from interactions with researchers.
>
> ---
>
> Polly is a deep pipeline of passes that were developed for the sole
> purpose of applying scheduling transformations at the very end. This
> "main purpose" of Polly makes it hard to effectively reuse
intermediate
> parts, e.g. the dependence analysis or the code generation framework, at
> least to the degree we might want to. In a different thread [1] Hal
> asked for the possibility to reuse Polly for tuning via pragmas e.g.,
> unrolling/interleaving/tiling/interchange. While this is interesting and
> generally possible it won't take long before the restrictions Polly
> places on the input code (to enable scheduling optimizations in the end)
> become a serious problem. The point here is that Polly has too many
> features and dependences for such an "easy" use case. (Note that
I do
> not question the legality of a user requested transformation here.) If
> you are not looking for "end-to-end polyhedral loop
optimizations" you
> want a system that answers a very specific question (e.g.,
> isVectorizable) or performs a very specific transformation (e.g, tiling)
> on a maximal set of programs. In contrast, Polly is (and I strongly
> believe it should be) developed as a system that performs complex
> optimizations, based on specialized analysis results, on a constraint
> type of programs.
>
> One might argue that Polly will become more modular when it is integrated
> into LLVM and when the intermediate results are used in more and more
> places. However, I'd say this is the "wrong direction" with
regards to:
>    1) incremental design and development,
>    2) maintainability of individual parts, and
>    3) the development of a suitable cost model (which Polly does not ship).
>
> Instead of starting with a full, hard to understand, scheduling pipeline
> we should start with the use cases at hand and design specific analysis
> (and also transformation) passes "from scratch". The long term
goal
> should be a full scheduling pipeline but the parts need to be designed
> in a modular (LLVM-way) from the very beginning. This allows us to:
>    - provide __immediate benefit__ to other developers,
>    - allow active participation in (and understanding of) the design of
>      each part, and
>    - develop the intermediate parts with the requirements of the whole
>      LLVM project in mind.
>
> Let me give two examples to make my point:
>
> -- Example 1 --
> In addition to the applicability issues there are other problems that
> arise from the full pipeline design. Let's assume we want to apply a
> pragma driven optimization to a loop that can be represented (and
> optimized) by Polly. Depending on the pragma we only need to perform a
> very specific task, e.g., adjust the iteration variable (loop inversion)
> or introduce new loops and adjust existing iteration variables (tiling).
> Polly will however represent (and actually analyze) the whole loop nest
> including the exact control flow and all accesses. In addition it
> will compute alias checks, constraints under which the representation
> is not truthful (e.g., an access function might overflow) and a lot of
> other things that are not needed for the task at hand.
I understand your point, but I think that you're assuming too much about 
the semantics of the pragmas. I believe that our current vectorization 
pragma clauses demonstrate the philosophy we should follow with other 
loop pragmas in this sense: We have vectorize(enable) and 
vectorize(assume_safety). The latter one does indeed just vectorize the 
loop without requiring dependency analysis, but the vectorize(enable) 
clause instructs the vectorizer to vectorize the loop 
(overriding/loosening some of the cost model constraints), but still 
uses the dependency analysis and generates runtime checks. Even if we 
have pragmas to direct the process, by default, we still likely want the 
full dependency analysis and the framework for generating safety predicates.
>
> -- Example 2 --
> We can also imagine we want to determine if a loop can be vectorized,
> thus if there are (short) loop carried dependences. No matter which
> technique is used, the complexity of a dependence analysis will
> certainly increase with the number (and complexity) of the analyzed
> accesses. Since Polly is designed with fine-grained scheduling
> optimizations in mind, the dependences it will compute are
> disproportionate with regards to the question asked.
Could you explain this in more detail?
>   Consequently, it
> will either take more time to determine if there are loop carried
> dependences
I think that this is clearly a question of degree. Almost any non-lazy 
analysis will over-compute for a particular transformation (e.g. 
computing a dominator tree), but there's engineering value in 
consistency. The question is just in the cost vs. the engineering 
complexity in customizing the analysis.
>   or a time-out in the computation will cause a conservatively
> negative result.
>
> ---
>
> While it is generally possible to disable some "features" of
Polly or to
> "work around them", it requires new independent code paths in an
already
> highly interleaved project. The explicit (and implicit) interactions
> between the different parts of Polly are plentiful and subtle. I don't
> think there are 5 people that know about more than half of them. (I also
> doubt there is no one that knows them all.)
This point is well taken. For better or worse, it reminds me of another 
component we've discussed a lot recently: InstCombine. Both (excluding 
isl) are also nearly the same size.

In this context, documenting these interactions is important (especially 
if we hope to rewrite large parts of it).
>
> ---
>
> I want to propose an alternative plan for which I would appreciate
> feedback. The feedback is especially useful since I will present this,
> or more accurately the concrete polyhedral analyses described below, in
> the student research competition at LLVM Dev meeting in October.
>
> The idea is simple. We rewrite parts of Polly with two goals in mind:
>    1) to provide analysis results and transformation options to LLVM, and
>    2) allow polyhedral loop transformation.
>
> A rewrite allows to revisit various design decisions that are buried
> deep in the Polly code and that hinder its application (in any context).
> The interested reader can find a short list at the end.
>
> I think, we should incrementally introduce polyhedral analysis
> capabilities in order to provide immediate benefits and to allow
> developers to get involved with the design and usage from an early point
> in time. Decoupled analyses also allow easier integration, interfaces
> that come "more natural" to non-polyhedral people, and better use
of the
> capabilities if polyhedral scheduling is not the goal.
>
> To demonstrate this incremental process I started to write a polyhedral
> value analysis**. Think of it as Scalar Evolution but with piece-wise
> defined polyhedral values as the abstract domain. This analysis is
> demand driven, works on the granularity of a LLVM value, is flow
> sensitive, and especially designed to be applicable on partially affine
> programs. It already supports most of what the polyhedral model does
> allow and includes a lot of things we prototyped in Polly. It shall be
> used when Scalar Evolution is not able to provide a sufficient result.
> On top I implemented a polyhedral memory analysis*** that is (currently)
> capable of summarizing the memory effects of a code region, e.g., a
> function. The next step is a demand-driven dependence analysis**** that
> utilizes the (often) very structured nature of a CFG. Afterwards, I
> would like to see a transformation framework that allows classical but
> also scheduling based transformations.
>
> While these analyses still depend on isl as a back-end, they never
> directly use it. Instead there is a C++ interface in-between that
> encapsulates the features needed. Once our use cases are clear we can
> re-implement this interface.
>
> ** For comparison, the value analysis covers parts of the ScopDetection,
> all of SCEVValidator and SCEVAffinator as well as parts of ScopInfo and
> ScopBuilder in Polly.
> *** The memory analysis is concerned with the construction and
> functionality Polly implements in the MemoryAccess class.
> **** The dependence analysis is similar to Polly's DependenceInfo class
> but is supposed to allow more fine-grained control.
Is the code for these available? Are you planning to contribute them? We 
could certainly consider these part of the plan.
>
> ---
>
> These are a few design decisions that I think should be revisited when
> we think about polyhedral-based analysis and optimizations in LLVM:
>
>    - We should not rely on Scalar Evolution. Instead we want to build
>      polyhedral values straight from the underlying IR. While
>      Scalar Evolution did certainly simplify Polly's development it
>      induced various problems:
>        - The lack of piece-wise affine values (other than min/max)
>          limits applicability and precision. (Though conditional SCEVs
>          are a small step in this direction.)
>        - The flow insensitive value analysis (Scalar Evolution) causes a
>          lot of "assumptions", thus versioning, for the otherwise
flow
>          sensitive Polly.
>        - There are caching issues when parts of the program are
>          transformed or when Polly is used for inter-procedural
>          analysis/optimization.
>
>    - We should not depend on single-entry-single-exit (SESE) regions.
>      Instead we analyze the code optimistically to the extend possible
>      (or required by the user). Afterwards we can easily specialize the
>      results to the code regions we are interested in. While SESE regions
>      always caused compile time issues (due to the use of the post
>      dominance tree) they also limit the analysis/transformations that
>      can be performed. As an example think of accesses that are
>      non-affine with regards to the smallest surrounding SESE region but
>      not in a smaller code region (e.g., a conditional). If the goal is
>      not loop scheduling but, for example, instruction reordering, it
>      is still valuable to determine the dependences between these
>      accesses in the smaller code region.
>
>    - We should iteratively and selective construct polyhedral
>      representations. The same holds for analysis results, e.g.
>      dependences. Let's reconsider example 2 from above. We want to
visit
>      one memory access at a time in order to build the polyhedral
>      representation and compute the new dependences. We also want to
>      immediately stop if we identify a loop-carried dependence. Finally,
>      we do not want to compute dependences in outer loop iterations or
>      dependences that are completely contained in one loop iteration.
If we're only considering inner-loop vectorization, then yes, we don't 
need outer-loop dependencies. If I understand what you mean, we actually 
might want intra-iteration dependencies for cases where reordering might 
allow for vectorization.
>
>    - We should encapsulate the code generation part completely from the
>      transformation part.
This separation sounds useful, but not necessarily to enable us to write 
a bunch of separate loop transformations. We may to split things for 
canonicaliation vs. lowering transformations, however.
>   Additionally we might want to start with
>      classical loop transformations instead of full polyhedral
>      scheduling. For a lot of classical loop transformations code
>      generation only needs to understand/change loop induction variables
>      and exit conditions. Duplication + rewriting of the loop body is
>      often not necessary. Pragma driven optimizations could be easily
>      done and we could also use heuristics similar to the ones we have
>      today (e.g., for loop distribution). The transformations we allow
>      should than grow with the scheduling decisions we allow and these
>      should be limited by a yet to be determined cost model.
Much of this sounds potentially good, but it's not clear to me that any 
of this is really an argument against integrating Polly into LLVM right 
now. The components you propose should be usable for both separate and 
integrated transformations and analysis, and at least in the short term, 
we'll end up needing isl regardless. Is your skepticism mainly about 
whether the current Polly could be transitioned to use the new 
infrastructure (without essentially rewriting all of it)? If so, I'd 
like to understand why.

Thanks again,
Hal
>
> ---
>
> I am aware that not all Polly developers will agree with my views and
> that some are simply trade-offs that do not offer a "correct
way".
> Nevertheless, I hope that the simple examples I used to point out
> various problems are enough to consider the route I proposed.
>
> Cheers,
>    Johannes
>
> [1] https://groups.google.com/d/msg/polly-dev/ACSRlFqtpDE/t5EDlIwtAgAJ
>
> On 09/01, Hal Finkel via llvm-dev wrote:
>> **
>>
>> *Hi everyone,As you may know, stock LLVM does not provide the kind of
>> ...
>>
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Johannes Doerfert via llvm-dev

2017-Sep-25 22:03 UTC

head link

[llvm-dev] [RFC] Polly Status and Integration

Hi Hal,

On 09/22, Hal Finkel wrote:> Hi, Johannes,
> 
> Thanks for writing this. I certainly think you have the right idea in terms
> of the desired end state and modular design.
Thanks for the feedback!

> On 09/19/2017 07:33 PM, Johannes Doerfert wrote:
> >Hi Hal, Tobias, Michael, and others,
> >
> >I'd like to add my view (and a proposal) to this discussion and I
> >apologize directly for doing this so late*. I also want to apologize
> >because this email is long, contains various technical details and also
> >argumentations that might need more justification. However, I am happy
> >to provide further information (and/or examples) to explain my views if
> >required.
> >
> >tldr;
> >We should introduce polyhedral analysis and optimization capabilities
> >into LLVM step by step. Along the way we should revisit design
decisions
> >made by Polly and adjust them according to the use cases in the rest of
> >LLVM. Finally, we should keep Polly as a stand-alone tool to allow
> >research into, and prototyping of, complex loop transformations.
> >
> >* There was a paper deadline end of last week and I simply had to
prioritize.
> >
>
>-------------------------------------------------------------------------------
> >
> >LLVM performs "simple" loop transformations and targets
almost
> >exclusively inner-most loops. Polly offers a pipeline to perform
> >"complex" polyhedral-based loop optimizations on a sequence
of loop
> >nests. It seems natural to integrate Polly into the LLVM pipeline as it
> >apparently complements it well. However, I do not believe that it is
> >wise to integrate Polly into LLVM for a variety of reasons, some of
> >which I will explain in more detail now. Afterwards I will propose a
> >different way to achieve (most of) the things Hal mentioned in his mail
> >in an incremental way. Nevertheless, I want to stress that I am still
an
> >advocate of Polly and I honestly believe it to be a valuable part of
the
> >LLVM environment. Though I strongly feel it should be continued as a
> >research tool and test bed to allow research into, and prototyping of,
> >(polyhedral-model-based) analysis and optimizations.
> 
> While this may end up being the conclusion, I'd find that splitting
> unfortunate. One of the strengths of LLVM is that, within the framework of
a
> production-quality compiler, we enable new compiler research. This
> strengthens research not only by allowing new ideas to be surrounded by
> existing ones in a realistic way, allowing the effects of the new ideas to
> be isolated convincingly, but also encourages new ideas to be developed and
> tested in a way that mirrors how they might be deployed in practice. This
> also encourages the research community to be part of the LLVM community,
and
> I believe that's a positive for the community as a whole. The more that
we
> isolate the "research tool" from the rest of the ecosystem, the
less value
> the research tool has, and the less the community as a whole benefits from
> interactions with researchers.
While I can see you point I am not sure if what you want is even
possible. Polly is driven (almost entirely) by research people because
the development allows it. If Polly is to be integrated and afterwards
developed by the same standards as the rest of LLVM, it will become more
difficult (and time intensive) to upstream research back into Polly.
Once that happens researchers will less likely commit (bigger) changes
back to the community but only publish their code and results. I think
that is a common way of doing things and Polly is (so far) an exception
due to its unique situation.

I'd even argue that a stronger separation of the different parts will
allow more researchers to play around with the polyhedral tools and also
ease the first steps into this area.

An example would be the OpenCL modeling we performed with Polly as a
research project. It was not upstreamed mostly because it would require
a compatible OpenCL driver but also because the modification to Polly
were substantial and hard to maintain. Several research groups and
companies have since tried to replicate it or to integrate our code in
their framework. While most of them were only interested in code
analysis they all were limited by the requirements of the polyhedral
scheduler + code generation. Consequently, there were various hacks and
workarounds. On the other hand, I did provide the (basically first draft
of the) polyhedral memory analysis to a research group that struggled to
get function-wide memory access summaries with Polly. They could
directly use and improve it.
> >Polly is a deep pipeline of passes that were developed for the sole
> >purpose of applying scheduling transformations at the very end. This
> >"main purpose" of Polly makes it hard to effectively reuse
intermediate
> >parts, e.g. the dependence analysis or the code generation framework,
at
> >least to the degree we might want to. In a different thread [1] Hal
> >asked for the possibility to reuse Polly for tuning via pragmas e.g.,
> >unrolling/interleaving/tiling/interchange. While this is interesting
and
> >generally possible it won't take long before the restrictions Polly
> >places on the input code (to enable scheduling optimizations in the
end)
> >become a serious problem. The point here is that Polly has too many
> >features and dependences for such an "easy" use case. (Note
that I do
> >not question the legality of a user requested transformation here.) If
> >you are not looking for "end-to-end polyhedral loop
optimizations" you
> >want a system that answers a very specific question (e.g.,
> >isVectorizable) or performs a very specific transformation (e.g,
tiling)
> >on a maximal set of programs. In contrast, Polly is (and I strongly
> >believe it should be) developed as a system that performs complex
> >optimizations, based on specialized analysis results, on a constraint
> >type of programs.
> >
> >One might argue that Polly will become more modular when it is
integrated
> >into LLVM and when the intermediate results are used in more and more
> >places. However, I'd say this is the "wrong direction"
with regards to:
> >   1) incremental design and development,
> >   2) maintainability of individual parts, and
> >   3) the development of a suitable cost model (which Polly does not
ship).
> >
> >Instead of starting with a full, hard to understand, scheduling
pipeline
> >we should start with the use cases at hand and design specific analysis
> >(and also transformation) passes "from scratch". The long
term goal
> >should be a full scheduling pipeline but the parts need to be designed
> >in a modular (LLVM-way) from the very beginning. This allows us to:
> >   - provide __immediate benefit__ to other developers,
> >   - allow active participation in (and understanding of) the design of
> >     each part, and
> >   - develop the intermediate parts with the requirements of the whole
> >     LLVM project in mind.
> >
> >Let me give two examples to make my point:
> >
> >-- Example 1 --
> >In addition to the applicability issues there are other problems that
> >arise from the full pipeline design. Let's assume we want to apply
a
> >pragma driven optimization to a loop that can be represented (and
> >optimized) by Polly. Depending on the pragma we only need to perform a
> >very specific task, e.g., adjust the iteration variable (loop
inversion)
> >or introduce new loops and adjust existing iteration variables
(tiling).
> >Polly will however represent (and actually analyze) the whole loop nest
> >including the exact control flow and all accesses. In addition it
> >will compute alias checks, constraints under which the representation
> >is not truthful (e.g., an access function might overflow) and a lot of
> >other things that are not needed for the task at hand.
> 
> I understand your point, but I think that you're assuming too much
about the
> semantics of the pragmas. I believe that our current vectorization pragma
> clauses demonstrate the philosophy we should follow with other loop pragmas
> in this sense: We have vectorize(enable) and vectorize(assume_safety). The
> latter one does indeed just vectorize the loop without requiring dependency
> analysis, but the vectorize(enable) clause instructs the vectorizer to
> vectorize the loop (overriding/loosening some of the cost model
> constraints), but still uses the dependency analysis and generates runtime
> checks. Even if we have pragmas to direct the process, by default, we still
> likely want the full dependency analysis and the framework for generating
> safety predicates.
As I stated, I assumed "force" semantics for pragmas.
> >-- Example 2 --
> >We can also imagine we want to determine if a loop can be vectorized,
> >thus if there are (short) loop carried dependences. No matter which
> >technique is used, the complexity of a dependence analysis will
> >certainly increase with the number (and complexity) of the analyzed
> >accesses. Since Polly is designed with fine-grained scheduling
> >optimizations in mind, the dependences it will compute are
> >disproportionate with regards to the question asked.
> 
> Could you explain this in more detail?
You got the idea right in your comment below. The design will simply
force us to "over-compute" dependence results.
> >  Consequently, it
> >will either take more time to determine if there are loop carried
> >dependences
> 
> I think that this is clearly a question of degree. Almost any non-lazy
> analysis will over-compute for a particular transformation (e.g. computing
a
> dominator tree), but there's engineering value in consistency. The
question
> is just in the cost vs. the engineering complexity in customizing the
> analysis.
Sure. Though, as I will discuss below, the customization effort is quite
different for a "whole pipeline approach" compared to separate
analyses.
> >  or a time-out in the computation will cause a conservatively
> >negative result.
> >
> >---
> >
> >While it is generally possible to disable some "features" of
Polly or to
> >"work around them", it requires new independent code paths in
an already
> >highly interleaved project. The explicit (and implicit) interactions
> >between the different parts of Polly are plentiful and subtle. I
don't
> >think there are 5 people that know about more than half of them. (I
also
> >doubt there is no one that knows them all.)
> 
> This point is well taken. For better or worse, it reminds me of another
> component we've discussed a lot recently: InstCombine. Both (excluding
isl)
> are also nearly the same size.
> 
> In this context, documenting these interactions is important (especially if
> we hope to rewrite large parts of it).
As I tried to argue above, that won't be an easy task.
> >---
> >
> >I want to propose an alternative plan for which I would appreciate
> >feedback. The feedback is especially useful since I will present this,
> >or more accurately the concrete polyhedral analyses described below, in
> >the student research competition at LLVM Dev meeting in October.
> >
> >The idea is simple. We rewrite parts of Polly with two goals in mind:
> >   1) to provide analysis results and transformation options to LLVM,
and
> >   2) allow polyhedral loop transformation.
> >
> >A rewrite allows to revisit various design decisions that are buried
> >deep in the Polly code and that hinder its application (in any
context).
> >The interested reader can find a short list at the end.
> >
> >I think, we should incrementally introduce polyhedral analysis
> >capabilities in order to provide immediate benefits and to allow
> >developers to get involved with the design and usage from an early
point
> >in time. Decoupled analyses also allow easier integration, interfaces
> >that come "more natural" to non-polyhedral people, and better
use of the
> >capabilities if polyhedral scheduling is not the goal.
> >
> >To demonstrate this incremental process I started to write a polyhedral
> >value analysis**. Think of it as Scalar Evolution but with piece-wise
> >defined polyhedral values as the abstract domain. This analysis is
> >demand driven, works on the granularity of a LLVM value, is flow
> >sensitive, and especially designed to be applicable on partially affine
> >programs. It already supports most of what the polyhedral model does
> >allow and includes a lot of things we prototyped in Polly. It shall be
> >used when Scalar Evolution is not able to provide a sufficient result.
> >On top I implemented a polyhedral memory analysis*** that is
(currently)
> >capable of summarizing the memory effects of a code region, e.g., a
> >function. The next step is a demand-driven dependence analysis**** that
> >utilizes the (often) very structured nature of a CFG. Afterwards, I
> >would like to see a transformation framework that allows classical but
> >also scheduling based transformations.
> >
> >While these analyses still depend on isl as a back-end, they never
> >directly use it. Instead there is a C++ interface in-between that
> >encapsulates the features needed. Once our use cases are clear we can
> >re-implement this interface.
> >
> >** For comparison, the value analysis covers parts of the
ScopDetection,
> >all of SCEVValidator and SCEVAffinator as well as parts of ScopInfo and
> >ScopBuilder in Polly.
> >*** The memory analysis is concerned with the construction and
> >functionality Polly implements in the MemoryAccess class.
> >**** The dependence analysis is similar to Polly's DependenceInfo
class
> >but is supposed to allow more fine-grained control.
> 
> Is the code for these available?
I pushed the first work in progress patch of the polyhedral value
analysis to phabricator today [0]. The interesting parts are in
  include/llvm/Analysis/PolyhedralValueInfo.h
and
  lib/Analysis/PolyhedralValueInfo.cpp
as well as the tests.

Note that the isl wrapper (PValue.{h,cpp}) lacks documentation because
it is still changing a lot.

[0] https://reviews.llvm.org/D38255
> Are you planning to contribute them? We could certainly consider these
> part of the plan.
My goal is to contribute these analysis to LLVM (assuming there is
interest for them).
> >---
> >
> >These are a few design decisions that I think should be revisited when
> >we think about polyhedral-based analysis and optimizations in LLVM:
> >
> >   - We should not rely on Scalar Evolution. Instead we want to build
> >     polyhedral values straight from the underlying IR. While
> >     Scalar Evolution did certainly simplify Polly's development it
> >     induced various problems:
> >       - The lack of piece-wise affine values (other than min/max)
> >         limits applicability and precision. (Though conditional SCEVs
> >         are a small step in this direction.)
> >       - The flow insensitive value analysis (Scalar Evolution) causes
a
> >         lot of "assumptions", thus versioning, for the
otherwise flow
> >         sensitive Polly.
> >       - There are caching issues when parts of the program are
> >         transformed or when Polly is used for inter-procedural
> >         analysis/optimization.
> >
> >   - We should not depend on single-entry-single-exit (SESE) regions.
> >     Instead we analyze the code optimistically to the extend possible
> >     (or required by the user). Afterwards we can easily specialize the
> >     results to the code regions we are interested in. While SESE
regions
> >     always caused compile time issues (due to the use of the post
> >     dominance tree) they also limit the analysis/transformations that
> >     can be performed. As an example think of accesses that are
> >     non-affine with regards to the smallest surrounding SESE region
but
> >     not in a smaller code region (e.g., a conditional). If the goal is
> >     not loop scheduling but, for example, instruction reordering, it
> >     is still valuable to determine the dependences between these
> >     accesses in the smaller code region.
> >
> >   - We should iteratively and selective construct polyhedral
> >     representations. The same holds for analysis results, e.g.
> >     dependences. Let's reconsider example 2 from above. We want to
visit
> >     one memory access at a time in order to build the polyhedral
> >     representation and compute the new dependences. We also want to
> >     immediately stop if we identify a loop-carried dependence.
Finally,
> >     we do not want to compute dependences in outer loop iterations or
> >     dependences that are completely contained in one loop iteration.
> 
> If we're only considering inner-loop vectorization, then yes, we
don't need
> outer-loop dependencies. If I understand what you mean, we actually might
> want intra-iteration dependencies for cases where reordering might allow
for
> vectorization.
Yes, that is what I meant. If you want to do outer-loop vectorization or
reordering you need will need the fine grained dependences. Though, I
actually don't argue against having them but for a variable granularity
instead.
> >   - We should encapsulate the code generation part completely from the
> >     transformation part.
> 
> This separation sounds useful, but not necessarily to enable us to write a
> bunch of separate loop transformations. We may to split things for
> canonicaliation vs. lowering transformations, however.
I think separate loop transformations have their merit* even if we have
a functioning polyhedral scheduling pipeline and especially if we do not
need to duplicate the transformation/code generation parts. Though that
is a discussion we can have when the time comes.

* We can use them for time critical compilation, to generate data for a
  cost model for the scheduler, ...
> >  Additionally we might want to start with
> >     classical loop transformations instead of full polyhedral
> >     scheduling. For a lot of classical loop transformations code
> >     generation only needs to understand/change loop induction
variables
> >     and exit conditions. Duplication + rewriting of the loop body is
> >     often not necessary. Pragma driven optimizations could be easily
> >     done and we could also use heuristics similar to the ones we have
> >     today (e.g., for loop distribution). The transformations we allow
> >     should than grow with the scheduling decisions we allow and these
> >     should be limited by a yet to be determined cost model.
> 
> Much of this sounds potentially good, but it's not clear to me that any
of
> this is really an argument against integrating Polly into LLVM right now.
It depends on what you want. If you want a polyhedral scheduler right
away, integration is the way to go. However, I don't think reuse of
Polly for other purposes is that simple and the changes I argue for are
(in this context) quite complicated (see below). That said, I think the
biggest downside of integration is the lack of an immediate benefit for
the rest of the developers and consequently the missing reason for them
to get involved. You might have people try out Polly but as long as it
does not deliver any benefits they will probably not spend the time to
understanding why. Instead they might be less interested in polyhedral
optimizations in the future.
> The components you propose should be usable for both separate and
integrated
> transformations and analysis, and at least in the short term, we'll end
up
> needing isl regardless. Is your skepticism mainly about whether the current
> Polly could be transitioned to use the new infrastructure (without
> essentially rewriting all of it)? If so, I'd like to understand why.
I do think that changing all the parts I mentioned will require a major
rewrite and that it will be more complicated than writing the components
"from scratch". The reason is simple, new components can be developed
one by one without the need to worry about the implications for later
passes. We can design them for the uses cases at hand while keeping
their use in a future polyhedral scheduling pipeline in mind. In Polly,
the things I want to change have implicit and explicit ties to various
parts in the pipeline. Consequently, it is extremely challenging to
change them only in parts of the pipeline e.g., the modeling, at least
if we want to keep the original functionality intact all the time.

Cheers,
  Johannes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170926/5aed6bc1/attachment.sig>

Tobias Grosser via llvm-dev

2017-Sep-26 07:00 UTC

head link

[llvm-dev] [RFC] Polly Status and Integration

On Tue, Sep 26, 2017, at 00:03, Johannes Doerfert wrote:> Hi Hal,
> 
> On 09/22, Hal Finkel wrote:
> > Hi, Johannes,
> > 
> > Thanks for writing this. I certainly think you have the right idea in
terms
> > of the desired end state and modular design.
> 
> Thanks for the feedback!
> 
> 
> > On 09/19/2017 07:33 PM, Johannes Doerfert wrote:
> > >Hi Hal, Tobias, Michael, and others,
> > >
> > >I'd like to add my view (and a proposal) to this discussion
and I
> > >apologize directly for doing this so late*. I also want to
apologize
> > >because this email is long, contains various technical details and
also
> > >argumentations that might need more justification. However, I am
happy
> > >to provide further information (and/or examples) to explain my
views if
> > >required.
> > >
> > >tldr;
> > >We should introduce polyhedral analysis and optimization
capabilities
> > >into LLVM step by step. Along the way we should revisit design
decisions
> > >made by Polly and adjust them according to the use cases in the
rest of
> > >LLVM. Finally, we should keep Polly as a stand-alone tool to allow
> > >research into, and prototyping of, complex loop transformations.
> > >
> > >* There was a paper deadline end of last week and I simply had to
prioritize.
> > >
> >
>-------------------------------------------------------------------------------
> > >
> > >LLVM performs "simple" loop transformations and targets
almost
> > >exclusively inner-most loops. Polly offers a pipeline to perform
> > >"complex" polyhedral-based loop optimizations on a
sequence of loop
> > >nests. It seems natural to integrate Polly into the LLVM pipeline
as it
> > >apparently complements it well. However, I do not believe that it
is
> > >wise to integrate Polly into LLVM for a variety of reasons, some
of
> > >which I will explain in more detail now. Afterwards I will propose
a
> > >different way to achieve (most of) the things Hal mentioned in his
mail
> > >in an incremental way. Nevertheless, I want to stress that I am
still an
> > >advocate of Polly and I honestly believe it to be a valuable part
of the
> > >LLVM environment. Though I strongly feel it should be continued as
a
> > >research tool and test bed to allow research into, and prototyping
of,
> > >(polyhedral-model-based) analysis and optimizations.
> > 
> > While this may end up being the conclusion, I'd find that
splitting
> > unfortunate. One of the strengths of LLVM is that, within the
framework of a
> > production-quality compiler, we enable new compiler research. This
> > strengthens research not only by allowing new ideas to be surrounded
by
> > existing ones in a realistic way, allowing the effects of the new
ideas to
> > be isolated convincingly, but also encourages new ideas to be
developed and
> > tested in a way that mirrors how they might be deployed in practice.
This
> > also encourages the research community to be part of the LLVM
community, and
> > I believe that's a positive for the community as a whole. The more
that we
> > isolate the "research tool" from the rest of the ecosystem,
the less value
> > the research tool has, and the less the community as a whole benefits
from
> > interactions with researchers.
> 
> While I can see you point I am not sure if what you want is even
> possible. Polly is driven (almost entirely) by research people because
> the development allows it. If Polly is to be integrated and afterwards
> developed by the same standards as the rest of LLVM, it will become more
> difficult (and time intensive) to upstream research back into Polly.
> Once that happens researchers will less likely commit (bigger) changes
> back to the community but only publish their code and results. I think
> that is a common way of doing things and Polly is (so far) an exception
> due to its unique situation.
Indeed Polly and PollyLabs in general are rather unique. I would claim
that
Polly and isl are -- in large parts -- developed to a standard  that
matches
LLVM. We certainly have more experimental areas but the extend we
test Polly and isl every day is clearly beyond a normal research
project.
Only this kind of maturity allows us to scale Polly to compile projects
such as AOSP.

Also, in comparsion to other research projects we have a pretty strong
track record of upstreaming our changes to LLVM.
> I'd even argue that a stronger separation of the different parts will
> allow more researchers to play around with the polyhedral tools and also
> ease the first steps into this area.
>
> An example would be the OpenCL modeling we performed with Polly as a
> research project. It was not upstreamed mostly because it would require
> a compatible OpenCL driver but also because the modification to Polly
> were substantial and hard to maintain. Several research groups and
> companies have since tried to replicate it or to integrate our code in
> their framework. While most of them were only interested in code
> analysis they all were limited by the requirements of the polyhedral
> scheduler + code generation. Consequently, there were various hacks and
> workarounds. On the other hand, I did provide the (basically first draft
> of the) polyhedral memory analysis to a research group that struggled to
> get function-wide memory access summaries with Polly. They could
> directly use and improve it.
We clearly do not upstream each research prototype. ;-)
 > > >Polly is a deep pipeline of passes that were developed for the
sole
> > >purpose of applying scheduling transformations at the very end.
This
> > >"main purpose" of Polly makes it hard to effectively
reuse intermediate
> > >parts, e.g. the dependence analysis or the code generation
framework, at
> > >least to the degree we might want to. In a different thread [1]
Hal
> > >asked for the possibility to reuse Polly for tuning via pragmas
e.g.,
> > >unrolling/interleaving/tiling/interchange. While this is
interesting and
> > >generally possible it won't take long before the restrictions
Polly
> > >places on the input code (to enable scheduling optimizations in
the end)
> > >become a serious problem. The point here is that Polly has too
many
> > >features and dependences for such an "easy" use case.
(Note that I do
> > >not question the legality of a user requested transformation
here.) If
> > >you are not looking for "end-to-end polyhedral loop
optimizations" you
> > >want a system that answers a very specific question (e.g.,
> > >isVectorizable) or performs a very specific transformation (e.g,
tiling)
> > >on a maximal set of programs. In contrast, Polly is (and I
strongly
> > >believe it should be) developed as a system that performs complex
> > >optimizations, based on specialized analysis results, on a
constraint
> > >type of programs.
> > >
> > >One might argue that Polly will become more modular when it is
integrated
> > >into LLVM and when the intermediate results are used in more and
more
> > >places. However, I'd say this is the "wrong
direction" with regards to:
> > >   1) incremental design and development,
> > >   2) maintainability of individual parts, and
> > >   3) the development of a suitable cost model (which Polly does
not ship).
> > >
> > >Instead of starting with a full, hard to understand, scheduling
pipeline
> > >we should start with the use cases at hand and design specific
analysis
> > >(and also transformation) passes "from scratch". The
long term goal
> > >should be a full scheduling pipeline but the parts need to be
designed
> > >in a modular (LLVM-way) from the very beginning. This allows us
to:
> > >   - provide __immediate benefit__ to other developers,
> > >   - allow active participation in (and understanding of) the
design of
> > >     each part, and
> > >   - develop the intermediate parts with the requirements of the
whole
> > >     LLVM project in mind.
> > >
> > >Let me give two examples to make my point:
> > >
> > >-- Example 1 --
> > >In addition to the applicability issues there are other problems
that
> > >arise from the full pipeline design. Let's assume we want to
apply a
> > >pragma driven optimization to a loop that can be represented (and
> > >optimized) by Polly. Depending on the pragma we only need to
perform a
> > >very specific task, e.g., adjust the iteration variable (loop
inversion)
> > >or introduce new loops and adjust existing iteration variables
(tiling).
> > >Polly will however represent (and actually analyze) the whole loop
nest
> > >including the exact control flow and all accesses. In addition it
> > >will compute alias checks, constraints under which the
representation
> > >is not truthful (e.g., an access function might overflow) and a
lot of
> > >other things that are not needed for the task at hand.
> > 
> > I understand your point, but I think that you're assuming too much
about the
> > semantics of the pragmas. I believe that our current vectorization
pragma
> > clauses demonstrate the philosophy we should follow with other loop
pragmas
> > in this sense: We have vectorize(enable) and vectorize(assume_safety).
The
> > latter one does indeed just vectorize the loop without requiring
dependency
> > analysis, but the vectorize(enable) clause instructs the vectorizer to
> > vectorize the loop (overriding/loosening some of the cost model
> > constraints), but still uses the dependency analysis and generates
runtime
> > checks. Even if we have pragmas to direct the process, by default, we
still
> > likely want the full dependency analysis and the framework for
generating
> > safety predicates.
> 
> As I stated, I assumed "force" semantics for pragmas.
> > >-- Example 2 --
> > >We can also imagine we want to determine if a loop can be
vectorized,
> > >thus if there are (short) loop carried dependences. No matter
which
> > >technique is used, the complexity of a dependence analysis will
> > >certainly increase with the number (and complexity) of the
analyzed
> > >accesses. Since Polly is designed with fine-grained scheduling
> > >optimizations in mind, the dependences it will compute are
> > >disproportionate with regards to the question asked.
> > 
> > Could you explain this in more detail?
> 
> You got the idea right in your comment below. The design will simply
> force us to "over-compute" dependence results.
> 
> > >  Consequently, it
> > >will either take more time to determine if there are loop carried
> > >dependences
> > 
> > I think that this is clearly a question of degree. Almost any non-lazy
> > analysis will over-compute for a particular transformation (e.g.
computing a
> > dominator tree), but there's engineering value in consistency. The
question
> > is just in the cost vs. the engineering complexity in customizing the
> > analysis.
> 
> Sure. Though, as I will discuss below, the customization effort is quite
> different for a "whole pipeline approach" compared to separate
analyses.
Just to clarify. Polly provides different analysis, but surely could be
more fine-granular.
> > >   - We should encapsulate the code generation part completely
from the
> > >     transformation part.
> > 
> > This separation sounds useful, but not necessarily to enable us to
write a
> > bunch of separate loop transformations. We may to split things for
> > canonicaliation vs. lowering transformations, however.
> 
> I think separate loop transformations have their merit* even if we have
> a functioning polyhedral scheduling pipeline and especially if we do not
> need to duplicate the transformation/code generation parts. Though that
> is a discussion we can have when the time comes.
Yes, we also need to adjust the pass pipeline for Polly (or any kind of
polyhedral transformation). Running e.g. the partial loop unroller
before the inliner is a very bad idea, as one can see when optimizing
libquantum.
> * We can use them for time critical compilation, to generate data for a
>   cost model for the scheduler, ...
> 
> > >  Additionally we might want to start with
> > >     classical loop transformations instead of full polyhedral
> > >     scheduling. For a lot of classical loop transformations code
> > >     generation only needs to understand/change loop induction
variables
> > >     and exit conditions. Duplication + rewriting of the loop body
is
> > >     often not necessary. Pragma driven optimizations could be
easily
> > >     done and we could also use heuristics similar to the ones we
have
> > >     today (e.g., for loop distribution). The transformations we
allow
> > >     should than grow with the scheduling decisions we allow and
these
> > >     should be limited by a yet to be determined cost model.
> > 
> > Much of this sounds potentially good, but it's not clear to me
that any of
> > this is really an argument against integrating Polly into LLVM right
now.
> 
> It depends on what you want. If you want a polyhedral scheduler right
> away, integration is the way to go. However, I don't think reuse of
> Polly for other purposes is that simple and the changes I argue for are
> (in this context) quite complicated (see below). That said, I think the
> biggest downside of integration is the lack of an immediate benefit for
> the rest of the developers and consequently the missing reason for them
> to get involved. You might have people try out Polly but as long as it
> does not deliver any benefits they will probably not spend the time to
> understanding why. Instead they might be less interested in polyhedral
> optimizations in the future.
You already pointed out in private to me that you are worried we are
"overselling" Polly. Let me make clear, Polly is a working end-to-end
Polyhedral optimizer, which I believe is very mature in terms of
robustness, but which is also at the very beginning of what can be
achieved with polyhedral transformations. I see it more as (a rather
robust) experimental backend in LLVM and we will make very clear that
this is what it is.
> > The components you propose should be usable for both separate and
integrated
> > transformations and analysis, and at least in the short term,
we'll end up
> > needing isl regardless. Is your skepticism mainly about whether the
current
> > Polly could be transitioned to use the new infrastructure (without
> > essentially rewriting all of it)? If so, I'd like to understand
why.
> 
> I do think that changing all the parts I mentioned will require a major
> rewrite and that it will be more complicated than writing the components
> "from scratch". The reason is simple, new components can be
developed
> one by one without the need to worry about the implications for later
> passes. We can design them for the uses cases at hand while keeping
> their use in a future polyhedral scheduling pipeline in mind. In Polly,
> the things I want to change have implicit and explicit ties to various
> parts in the pipeline. Consequently, it is extremely challenging to
> change them only in parts of the pipeline e.g., the modeling, at least
> if we want to keep the original functionality intact all the time.
I agree here. There is a trade-off between the cost of keeping things
working and the benefit of immediate -- large scale -- test coverage.
For some of the changes you propose -- e.g. extracting parts of Polly to
build a polyhedral value analysis -- extracting it out individually is
clearly the right approach. For other parts, this is less clear. I feel
figuring this out in detail as part of this thread will be close to
impossible. I feel doing this -- as part of targeted technical
discussions  -- will likely be more productive.

Best,
Tobias

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2017 - [RFC] Polly Status and Integration

[llvm-dev] [RFC] Polly Status and Integration

[llvm-dev] [RFC] Polly Status and Integration

[llvm-dev] [RFC] Polly Status and Integration

Possibly Parallel Threads