Tobias Grosser via llvm-dev
2017-Sep-26 07:00 UTC
[llvm-dev] [RFC] Polly Status and Integration
On Tue, Sep 26, 2017, at 00:03, Johannes Doerfert wrote:> Hi Hal, > > On 09/22, Hal Finkel wrote: > > Hi, Johannes, > > > > Thanks for writing this. I certainly think you have the right idea in terms > > of the desired end state and modular design. > > Thanks for the feedback! > > > > On 09/19/2017 07:33 PM, Johannes Doerfert wrote: > > >Hi Hal, Tobias, Michael, and others, > > > > > >I'd like to add my view (and a proposal) to this discussion and I > > >apologize directly for doing this so late*. I also want to apologize > > >because this email is long, contains various technical details and also > > >argumentations that might need more justification. However, I am happy > > >to provide further information (and/or examples) to explain my views if > > >required. > > > > > >tldr; > > >We should introduce polyhedral analysis and optimization capabilities > > >into LLVM step by step. Along the way we should revisit design decisions > > >made by Polly and adjust them according to the use cases in the rest of > > >LLVM. Finally, we should keep Polly as a stand-alone tool to allow > > >research into, and prototyping of, complex loop transformations. > > > > > >* There was a paper deadline end of last week and I simply had to prioritize. > > > > > >------------------------------------------------------------------------------- > > > > > >LLVM performs "simple" loop transformations and targets almost > > >exclusively inner-most loops. Polly offers a pipeline to perform > > >"complex" polyhedral-based loop optimizations on a sequence of loop > > >nests. It seems natural to integrate Polly into the LLVM pipeline as it > > >apparently complements it well. However, I do not believe that it is > > >wise to integrate Polly into LLVM for a variety of reasons, some of > > >which I will explain in more detail now. Afterwards I will propose a > > >different way to achieve (most of) the things Hal mentioned in his mail > > >in an incremental way. Nevertheless, I want to stress that I am still an > > >advocate of Polly and I honestly believe it to be a valuable part of the > > >LLVM environment. Though I strongly feel it should be continued as a > > >research tool and test bed to allow research into, and prototyping of, > > >(polyhedral-model-based) analysis and optimizations. > > > > While this may end up being the conclusion, I'd find that splitting > > unfortunate. One of the strengths of LLVM is that, within the framework of a > > production-quality compiler, we enable new compiler research. This > > strengthens research not only by allowing new ideas to be surrounded by > > existing ones in a realistic way, allowing the effects of the new ideas to > > be isolated convincingly, but also encourages new ideas to be developed and > > tested in a way that mirrors how they might be deployed in practice. This > > also encourages the research community to be part of the LLVM community, and > > I believe that's a positive for the community as a whole. The more that we > > isolate the "research tool" from the rest of the ecosystem, the less value > > the research tool has, and the less the community as a whole benefits from > > interactions with researchers. > > While I can see you point I am not sure if what you want is even > possible. Polly is driven (almost entirely) by research people because > the development allows it. If Polly is to be integrated and afterwards > developed by the same standards as the rest of LLVM, it will become more > difficult (and time intensive) to upstream research back into Polly. > Once that happens researchers will less likely commit (bigger) changes > back to the community but only publish their code and results. I think > that is a common way of doing things and Polly is (so far) an exception > due to its unique situation.Indeed Polly and PollyLabs in general are rather unique. I would claim that Polly and isl are -- in large parts -- developed to a standard that matches LLVM. We certainly have more experimental areas but the extend we test Polly and isl every day is clearly beyond a normal research project. Only this kind of maturity allows us to scale Polly to compile projects such as AOSP. Also, in comparsion to other research projects we have a pretty strong track record of upstreaming our changes to LLVM.> I'd even argue that a stronger separation of the different parts will > allow more researchers to play around with the polyhedral tools and also > ease the first steps into this area. > > An example would be the OpenCL modeling we performed with Polly as a > research project. It was not upstreamed mostly because it would require > a compatible OpenCL driver but also because the modification to Polly > were substantial and hard to maintain. Several research groups and > companies have since tried to replicate it or to integrate our code in > their framework. While most of them were only interested in code > analysis they all were limited by the requirements of the polyhedral > scheduler + code generation. Consequently, there were various hacks and > workarounds. On the other hand, I did provide the (basically first draft > of the) polyhedral memory analysis to a research group that struggled to > get function-wide memory access summaries with Polly. They could > directly use and improve it.We clearly do not upstream each research prototype. ;-)> > >Polly is a deep pipeline of passes that were developed for the sole > > >purpose of applying scheduling transformations at the very end. This > > >"main purpose" of Polly makes it hard to effectively reuse intermediate > > >parts, e.g. the dependence analysis or the code generation framework, at > > >least to the degree we might want to. In a different thread [1] Hal > > >asked for the possibility to reuse Polly for tuning via pragmas e.g., > > >unrolling/interleaving/tiling/interchange. While this is interesting and > > >generally possible it won't take long before the restrictions Polly > > >places on the input code (to enable scheduling optimizations in the end) > > >become a serious problem. The point here is that Polly has too many > > >features and dependences for such an "easy" use case. (Note that I do > > >not question the legality of a user requested transformation here.) If > > >you are not looking for "end-to-end polyhedral loop optimizations" you > > >want a system that answers a very specific question (e.g., > > >isVectorizable) or performs a very specific transformation (e.g, tiling) > > >on a maximal set of programs. In contrast, Polly is (and I strongly > > >believe it should be) developed as a system that performs complex > > >optimizations, based on specialized analysis results, on a constraint > > >type of programs. > > > > > >One might argue that Polly will become more modular when it is integrated > > >into LLVM and when the intermediate results are used in more and more > > >places. However, I'd say this is the "wrong direction" with regards to: > > > 1) incremental design and development, > > > 2) maintainability of individual parts, and > > > 3) the development of a suitable cost model (which Polly does not ship). > > > > > >Instead of starting with a full, hard to understand, scheduling pipeline > > >we should start with the use cases at hand and design specific analysis > > >(and also transformation) passes "from scratch". The long term goal > > >should be a full scheduling pipeline but the parts need to be designed > > >in a modular (LLVM-way) from the very beginning. This allows us to: > > > - provide __immediate benefit__ to other developers, > > > - allow active participation in (and understanding of) the design of > > > each part, and > > > - develop the intermediate parts with the requirements of the whole > > > LLVM project in mind. > > > > > >Let me give two examples to make my point: > > > > > >-- Example 1 -- > > >In addition to the applicability issues there are other problems that > > >arise from the full pipeline design. Let's assume we want to apply a > > >pragma driven optimization to a loop that can be represented (and > > >optimized) by Polly. Depending on the pragma we only need to perform a > > >very specific task, e.g., adjust the iteration variable (loop inversion) > > >or introduce new loops and adjust existing iteration variables (tiling). > > >Polly will however represent (and actually analyze) the whole loop nest > > >including the exact control flow and all accesses. In addition it > > >will compute alias checks, constraints under which the representation > > >is not truthful (e.g., an access function might overflow) and a lot of > > >other things that are not needed for the task at hand. > > > > I understand your point, but I think that you're assuming too much about the > > semantics of the pragmas. I believe that our current vectorization pragma > > clauses demonstrate the philosophy we should follow with other loop pragmas > > in this sense: We have vectorize(enable) and vectorize(assume_safety). The > > latter one does indeed just vectorize the loop without requiring dependency > > analysis, but the vectorize(enable) clause instructs the vectorizer to > > vectorize the loop (overriding/loosening some of the cost model > > constraints), but still uses the dependency analysis and generates runtime > > checks. Even if we have pragmas to direct the process, by default, we still > > likely want the full dependency analysis and the framework for generating > > safety predicates. > > As I stated, I assumed "force" semantics for pragmas. > > >-- Example 2 -- > > >We can also imagine we want to determine if a loop can be vectorized, > > >thus if there are (short) loop carried dependences. No matter which > > >technique is used, the complexity of a dependence analysis will > > >certainly increase with the number (and complexity) of the analyzed > > >accesses. Since Polly is designed with fine-grained scheduling > > >optimizations in mind, the dependences it will compute are > > >disproportionate with regards to the question asked. > > > > Could you explain this in more detail? > > You got the idea right in your comment below. The design will simply > force us to "over-compute" dependence results. > > > > Consequently, it > > >will either take more time to determine if there are loop carried > > >dependences > > > > I think that this is clearly a question of degree. Almost any non-lazy > > analysis will over-compute for a particular transformation (e.g. computing a > > dominator tree), but there's engineering value in consistency. The question > > is just in the cost vs. the engineering complexity in customizing the > > analysis. > > Sure. Though, as I will discuss below, the customization effort is quite > different for a "whole pipeline approach" compared to separate analyses.Just to clarify. Polly provides different analysis, but surely could be more fine-granular.> > > - We should encapsulate the code generation part completely from the > > > transformation part. > > > > This separation sounds useful, but not necessarily to enable us to write a > > bunch of separate loop transformations. We may to split things for > > canonicaliation vs. lowering transformations, however. > > I think separate loop transformations have their merit* even if we have > a functioning polyhedral scheduling pipeline and especially if we do not > need to duplicate the transformation/code generation parts. Though that > is a discussion we can have when the time comes.Yes, we also need to adjust the pass pipeline for Polly (or any kind of polyhedral transformation). Running e.g. the partial loop unroller before the inliner is a very bad idea, as one can see when optimizing libquantum.> * We can use them for time critical compilation, to generate data for a > cost model for the scheduler, ... > > > > Additionally we might want to start with > > > classical loop transformations instead of full polyhedral > > > scheduling. For a lot of classical loop transformations code > > > generation only needs to understand/change loop induction variables > > > and exit conditions. Duplication + rewriting of the loop body is > > > often not necessary. Pragma driven optimizations could be easily > > > done and we could also use heuristics similar to the ones we have > > > today (e.g., for loop distribution). The transformations we allow > > > should than grow with the scheduling decisions we allow and these > > > should be limited by a yet to be determined cost model. > > > > Much of this sounds potentially good, but it's not clear to me that any of > > this is really an argument against integrating Polly into LLVM right now. > > It depends on what you want. If you want a polyhedral scheduler right > away, integration is the way to go. However, I don't think reuse of > Polly for other purposes is that simple and the changes I argue for are > (in this context) quite complicated (see below). That said, I think the > biggest downside of integration is the lack of an immediate benefit for > the rest of the developers and consequently the missing reason for them > to get involved. You might have people try out Polly but as long as it > does not deliver any benefits they will probably not spend the time to > understanding why. Instead they might be less interested in polyhedral > optimizations in the future.You already pointed out in private to me that you are worried we are "overselling" Polly. Let me make clear, Polly is a working end-to-end Polyhedral optimizer, which I believe is very mature in terms of robustness, but which is also at the very beginning of what can be achieved with polyhedral transformations. I see it more as (a rather robust) experimental backend in LLVM and we will make very clear that this is what it is.> > The components you propose should be usable for both separate and integrated > > transformations and analysis, and at least in the short term, we'll end up > > needing isl regardless. Is your skepticism mainly about whether the current > > Polly could be transitioned to use the new infrastructure (without > > essentially rewriting all of it)? If so, I'd like to understand why. > > I do think that changing all the parts I mentioned will require a major > rewrite and that it will be more complicated than writing the components > "from scratch". The reason is simple, new components can be developed > one by one without the need to worry about the implications for later > passes. We can design them for the uses cases at hand while keeping > their use in a future polyhedral scheduling pipeline in mind. In Polly, > the things I want to change have implicit and explicit ties to various > parts in the pipeline. Consequently, it is extremely challenging to > change them only in parts of the pipeline e.g., the modeling, at least > if we want to keep the original functionality intact all the time.I agree here. There is a trade-off between the cost of keeping things working and the benefit of immediate -- large scale -- test coverage. For some of the changes you propose -- e.g. extracting parts of Polly to build a polyhedral value analysis -- extracting it out individually is clearly the right approach. For other parts, this is less clear. I feel figuring this out in detail as part of this thread will be close to impossible. I feel doing this -- as part of targeted technical discussions -- will likely be more productive. Best, Tobias
Sebastian Pop via llvm-dev
2017-Sep-27 21:54 UTC
[llvm-dev] [RFC] Polly Status and Integration
On Tue, Sep 26, 2017 at 2:00 AM, Tobias Grosser via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On Tue, Sep 26, 2017, at 00:03, Johannes Doerfert wrote: >> It depends on what you want. If you want a polyhedral scheduler right >> away, integration is the way to go.I think this is the topic of the thread as the folks who started this discussion stated that they want to see Polly integrated in LLVM.>> However, I don't think reuse of >> Polly for other purposes is that simple and the changes I argue for are >> (in this context) quite complicated (see below).You are just saying that there are difficult to write polyhedral passes that do not fully rely on the infrastructure that Polly provides, which in my opinion, is perfectly fine. Polly's integration will bring to LLVM more than the loop transforms that Polly implements with isl's schedulers: there are a set of issues that Polly had to solve and that will be of use by the other polyhedral analyses and optimizations that you are mentioning. Here are a few examples of the more general infrastructure that Polly enables: license of isl compatible with LLVM, isl's compute-out, imath, native int computations instead of MP, etc.>> I do think that changing all the parts I mentioned will require a major >> rewrite and that it will be more complicated than writing the components >> "from scratch".Nobody tries to stop you writing those things from scratch. However if you want to make those passes useful to the existing code, for example in value range propagation, or in some of the redundant code elimination passes, or in Polly, you would have to deal with the complexities of transitioning the existing code to the new analysis interfaces.>> The reason is simple, new components can be developed >> one by one without the need to worry about the implications for later >> passes. We can design them for the uses cases at hand while keeping >> their use in a future polyhedral scheduling pipeline in mind. In Polly, >> the things I want to change have implicit and explicit ties to various >> parts in the pipeline. Consequently, it is extremely challenging to >> change them only in parts of the pipeline e.g., the modeling, at least >> if we want to keep the original functionality intact all the time.This is normal operation: work in the compiler is incremental, it happens by evolution and rarely by revolution. I will give you an example from another part of the compiler that we try to extend while keeping the existing pass work at all time: jump-threading. The original jump-thread pass was contributed more than 10 years ago, and it does pretty well to optimize the code. However it still cannot jump-thread across multiple basic blocks and across loop back-edges: there were several attempts in extending the existing pass without much traction. An easy way would be to start from scratch and just implement a new pass. However that would require to implement more than half of the analysis and code generation with code that would mostly look the same. I see no good reason to throw away that code when we could very well improve over the existing code in incremental steps: maintaining the dominators, improve the way we operate on the CFG, maintaining loops across several jump-threads, etc., the sum of which gets us closer to what we want to achieve while improving the existing code.> For some of the changes you propose -- e.g. extracting parts of Polly to > build a polyhedral value analysis -- extracting it out individually is > clearly the right approach.I agree: making the infrastructure more general and used in more code than in isl's polyhedral schedulers can happen as incremental changes as new functionality is built. If the current way Polly does things stands in the way, it should be fixed in incremental steps. Sebastian
Johannes Doerfert via llvm-dev
2017-Sep-29 01:15 UTC
[llvm-dev] [RFC] Polly Status and Integration
Hi Sebastian, thanks for the comments! On 09/27, Sebastian Pop wrote:> On Tue, Sep 26, 2017 at 2:00 AM, Tobias Grosser via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Tue, Sep 26, 2017, at 00:03, Johannes Doerfert wrote: > >> It depends on what you want. If you want a polyhedral scheduler right > >> away, integration is the way to go. > > I think this is the topic of the thread as the folks who started > this discussion stated that they want to see Polly integrated > in LLVM. > > >> However, I don't think reuse of > >> Polly for other purposes is that simple and the changes I argue for are > >> (in this context) quite complicated (see below). > > You are just saying that there are difficult to write polyhedral > passes that do not fully rely on the infrastructure that Polly > provides, which in my opinion, is perfectly fine. > > Polly's integration will bring to LLVM more than the loop > transforms that Polly implements with isl's schedulers: > there are a set of issues that Polly had to solve and that will > be of use by the other polyhedral analyses and optimizations > that you are mentioning. > Here are a few examples of the more general infrastructure > that Polly enables: license of isl compatible with LLVM, isl's > compute-out, imath, native int computations instead of MP, etc.Do we need to integrate Polly to get these benefits?> >> I do think that changing all the parts I mentioned will require a major > >> rewrite and that it will be more complicated than writing the components > >> "from scratch". > > Nobody tries to stop you writing those things from scratch.I never got this impression, if it did look that way than by accident.> However if you want to make those passes useful to the > existing code, for example in value range propagation, > or in some of the redundant code elimination passes, > or in Polly, you would have to deal with the complexities > of transitioning the existing code to the new analysis interfaces.Actually, I started to copy parts of the ScalarEvolution interface in order to integrate the analysis passes this way into LLVM transformations. While it is obviously hard (and probably not useful) to provide "exactly" the same interface as ScalarEvolution, I can look at the use cases e.g., LoopDeletion, and try to come up with a general way to convey the needed information, e.g. "bool mayBeInfinite(Loop *L)". My point is that I can easily use the modular analysis design to do only what is needed to answer a query without any worrying about side effects that are of no concern. Let me elaborate on this example. If you want to know if a loop might be infinite you only need to check the iteration domain (and the soundness of the modeling of that domain). What you explicitly not need to look at are accesses (in order to safe time and increases applicability). To use Polly for this query you would need to modify multiple code locations. That means even more code paths just to prevent the analysis of accesses but allow only the domains to be represented. I do not say it is impossible but I try to argue that Polly is just not designed for such a use case and we should not pretend otherwise.> >> The reason is simple, new components can be developed > >> one by one without the need to worry about the implications for later > >> passes. We can design them for the uses cases at hand while keeping > >> their use in a future polyhedral scheduling pipeline in mind. In Polly, > >> the things I want to change have implicit and explicit ties to various > >> parts in the pipeline. Consequently, it is extremely challenging to > >> change them only in parts of the pipeline e.g., the modeling, at least > >> if we want to keep the original functionality intact all the time. > > This is normal operation: work in the compiler is incremental, > it happens by evolution and rarely by revolution.My point exactly.> I will give you an example from another part of the compiler that > we try to extend while keeping the existing pass work at all time: > jump-threading. > The original jump-thread pass was contributed more than > 10 years ago, and it does pretty well to optimize the code. > However it still cannot jump-thread across multiple basic blocks > and across loop back-edges: there were several attempts in > extending the existing pass without much traction. > An easy way would be to start from scratch and just implement > a new pass. However that would require to implement more > than half of the analysis and code generation with code that > would mostly look the same. I see no good reason to throw > away that code when we could very well improve over the > existing code in incremental steps: > maintaining the dominators, improve the way we operate on > the CFG, maintaining loops across several jump-threads, etc., > the sum of which gets us closer to what we want to achieve > while improving the existing code.First, I never argued to throw anything away. Second, I do not know the details of the jump-threading evolution. Third, we also do reimplement things from scratch from time to time while we keep on using the original code.> > For some of the changes you propose -- e.g. extracting parts of Polly to > > build a polyhedral value analysis -- extracting it out individually is > > clearly the right approach. > > I agree: making the infrastructure more general and used in > more code than in isl's polyhedral schedulers can happen > as incremental changes as new functionality is built. > If the current way Polly does things stands in the way, > it should be fixed in incremental steps.I tried that. One reason I had to give up is that it is simply much harder. You cannot always get rid of fundamental design choices incrementally. At least not if you do not want to disturb the rest of the pipeline while you do it. And while this is not (yet) a mission critical component I do think people would like to keep the scheduling capabilities of Polly working all the time. Cheers, Johannes -- Johannes Doerfert Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland Informatics Campus, Germany Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170929/e8df06d9/attachment.sig>