C Bergström via llvm-dev
2017-Sep-13 07:16 UTC
[llvm-dev] [RFC] Polly Status and Integration
A completely non-technical point, but what's the current "polly" license? Does integrating that code conflict in any way with the work being done to relicense llvm? Does adding polly expose any additional legal risks? Some people from Reservoir labs have explicitly stated to me that some of their patents target polyhedral optimizations. You should almost certainly review their portfolio or contact them. If at some point someone wants to add real loop optimizations - will there be a conflict? What's the DWARF look like after poly transformations? The talk about performance is pretty light - It would be good to get something besides just a handful of spotlight known codes. Also code size, compilation speed. etc ------------ flag bikeshed - If it's not ready for -O3 - create specific flags to specific poly passes. Creating yet another micro flag like -O3poly just doesn't make sense to me. (keep it simple.) When it's really really ready, enable it with the rest of the loop heavy passes. On Wed, Sep 13, 2017 at 11:26 AM, Gerolf Hoflehner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Sep 11, 2017, at 10:47 PM, Hal Finkel via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > On 09/11/2017 12:26 PM, Adam Nemet wrote: > > Hi Hal, Tobias, Michael and others, > > On Sep 1, 2017, at 11:47 AM, Hal Finkel via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > *Hi everyone, As you may know, stock LLVM does not provide the kind of > advanced loop transformations necessary to provide good performance on many > applications. LLVM's Polly project provides many of the required > capabilities, including loop transformations such as fission, fusion, > skewing, blocking/tiling, and interchange, all powered by state-of-the-art > dependence analysis. Polly also provides automated parallelization and > targeting of GPUs and other accelerators.* > > > > > > > > > > > > > > * Over the past year, Polly’s development has focused on robustness, > correctness, and closer integration with LLVM. To highlight a few > accomplishments: - Polly now runs, by default, in the conceptually-proper > place in LLVM’s pass pipeline (just before the loop vectorizer). > Importantly, this means that its loop transformations are performed after > inlining and other canonicalization, greatly increasing its robustness, and > enabling its use on C++ code (where [] is often a function call before > inlining). - Polly’s cost-modeling parameters, such as those describing the > target’s memory hierarchy, are being integrated with TargetTransformInfo. > This allows targets to properly override the modeling parameters and allows > reuse of these parameters by other clients. - Polly’s method of handling > signed division/remainder operations, which worked around lack of support > in ScalarEvolution, is being replaced thanks to improvements being > contributed to ScalarEvolution itself (see D34598). Polly’s core > delinearization routines have long been a part of LLVM itself. - > PolyhedralInfo, which exposes a subset of Polly’s loop analysis for use by > other clients, is now available. - Polly is now part of the LLVM release > process and is being included with LLVM by various packagers (e.g., > Debian). I believe that the LLVM community would benefit from beginning the > process of integrating Polly with LLVM itself and continuing its > development as part of our main code base. This will: - Allow for wider > adoption of LLVM within communities relying on advanced loop > transformations. - Provide for better community feedback on, and testing > of, the code developed (although the story in this regard is already fairly > solid). - Better motivate targets to provide accurate, comprehensive, > modeling parameters for use by advanced loop transformations. - Perhaps > most importantly, this will allow us to develop and tune the rest of the > optimizer assuming that Polly’s capabilities are present (the underlying > analysis, and eventually, the transformations themselves). The largest > issue on which community consensus is required, in order to move forward at > all, is what to do with isl. isl, the Integer Set Library, provides core > functionality on which Polly depends. It is a C library, and while some > Polly/LLVM developers are also isl developers, it has a large user > community outside of LLVM/Polly. A C++ interface was recently added, and > Polly is transitioning to use the C++ interface. Nevertheless, options here > include rewriting the needed functionality, forking isl and transitioning > our fork toward LLVM coding conventions (and data structures) over time, > and incorporating isl more-or-less as-is to avoid partitioning its > development. That having been said, isl is internally modular, and > regardless of the overall integration strategy, the Polly developers > anticipate specializing, or even replacing, some of these components with > LLVM-specific solutions. This is especially true for anything that touches > performance-related heuristics and modeling. LLVM-specific, or even > target-specific, loop schedulers may be developed as well. Even though some > developers in the LLVM community already have a background in > polyhedral-modeling techniques, the Polly developers have developed, and > are still developing, extensive tutorials on this topic > http://pollylabs.org/education.html <http://pollylabs.org/education.html> > and especially http://playground.pollylabs.org > <http://playground.pollylabs.org/>. Finally, let me highlight a few ongoing > development efforts in Polly that are potentially relevant to this > discussion. Polly’s loop analysis is sound and technically superior to > what’s in LLVM currently (i.e. in LoopAccessAnalysis and > DependenceAnalysis). There are, however, two known reasons why Polly’s > transformations could not yet be enabled by default: - A correctness issue: > Currently, Polly assumes that 64 bits is large enough for all new > loop-induction variables and index expressions. In rare cases, > transformations could be performed where more bits are required. > Preconditions need to be generated preventing this (e.g., D35471). - A > performance issue: Polly currently models temporal locality (i.e., it tries > to get better reuse in time), but does not model spatial locality (i.e., it > does not model cache-line reuse). As a result, it can sometimes introduce > performance regressions. Polly Labs is currently working on integrating > spatial locality modeling into the loop optimization model. Polly can > already split apart basic blocks in order to implement loop fusion. > Heuristics to choose at which granularity are still being implemented > (e.g., PR12402). I believe that we can now develop a concrete plan for > moving state-of-the-art loop optimizations, based on the technology in the > Polly project, into LLVM. Doing so will enable LLVM to be competitive with > proprietary compilers in high-performance computing, machine learning, and > other important application domains. I’d like community feedback on what > should be part of that plan. * > > > One thing that I’d like to see more details on is what this means for the > evolution of loop transformations in LLVM. > > Our more-or-less established direction was so far to incrementally improve > and generalize the required analyses (e.g. the LoopVectorizer’s dependence > analysis + loop versioning analysis into a stand-alone analysis pass > (LoopAccessAnalysis)) and then build new transformations (e.g. > LoopDistribution, LoopLoadElimination, LICMLoopVersioning) on top of these. > > The idea was that infrastructure would be incrementally improved from two > directions: > > - As new transformations are built analyses have to be improved (e.g. past > improvements to LAA to support the LoopVersioning utility, future > improvements for full LoopSROA beyond just store->load forwarding [1] or > the improvements to LAA for the LoopFusion proposal[2]) > > - As more complex loops would have to be analyzed we either improve LAA or > make DependenceAnalysis a drop-in replacement for the memory analysis part > in LAA > > > Or we could use Polly's dependence analysis, which I believe to be more > powerful, more robust, and more correct than DependenceAnalysis. I believe > that the difficult part here is actually the pairing with predicated SCEV > or whatever mechanism we want to use generate runtime predicates (this > applies to use of DependenceAnalysis too). > > > What is a good way to measure these assertions (More powerful, more > robust)? Are you saying the LLVM Dependence Analysis is incorrect or do you > actually mean less conservative (or "more accurate" or something like that)? > > > > While this model may be slow it has all the benefits of the incremental > development model. > > > The current model may have been slow in many areas, but I think that's > mostly a question of development effort. My largest concern about the > current model is that, to the extent that we're implementing classic loop > transformations (e.g., fusion, distribution, interchange, skewing, tiling, > and so on), we're repeating a historical design that is known to have > several suboptimal properties. Chief among them is the lack of integration: > many of these transformations are interconnected, and there's no good pass > ordering in which to make independent decisions. Many of these > transformations can be captured in a single model and we can get much > better results by integrating them. There's also the matter of whether > building these transformation on SCEV (or IR directly) is the best > underlying infrastructure, or whether parts of Polly would be better. > > > I believe that is true. What I wonder is is there a good method to reason > about it? Perhaps concrete examples or perhaps opt-viewer based comparisons > on large sets of benchmarks? In the big picture you could make such a > modeling argument for all compiler optimizations. > > > That having been said, I think that integrating this technology into LLVM > will also mean applying appropriate modularity. I think that we'll almost > definitely want to make use of the dependence analysis separately as an > analysis. We'll want to decide which of these transformations will be > considered canonicalization (and run in the iterative pipeline) and which > will be lowering (and run near the vectorizer). LoopSROA certainly sounds > to me like canonicalization, but loop fusion might also fall into that > category (i.e., we might want to fuse early to enable optimizations and > then split late). > > > Then there is the question of use cases. It’s fairly obvious that anybody > wanting to optimize a 5-deep highly regular loop-nest operating on arrays > should use Polly. On the other hand it’s way less clear that we should use > it for singly or doubly nested not-so-regular loops which are the norm in > non-HPC workloads. > > > This is clearly a good question, but thinking about Polly as a set of > components, not as a monolithic transformation component, I think that > polyhedral analysis and transformations can underlie a lot of the > transformations we need for non-HPC code (and, which I'll point out, we > need for modern HPC code too). In practice, the loops that we can actually > analyze have affine dependencies, and Polly does, or can do, a better job > at generating runtime predicates and dealing with piecewise-linear > expressions than our current infrastructure. > > In short, I look at Polly as two things: First, an infrastructure for > dealing with loop analysis and transformation. I view this as being broadly > applicable. Second, an application of that to apply cost-model-driven > classic loop transformations. To some extent this is going to be more > useful for HPC codes, but also applies to machine learning, signal > processing, graphics, and other areas. > > I’m wondering if it could be used for pointing out headroom for the > existing LLVM ecosystem (*) > > > > And this brings me to the maintenance question. Is it reasonable to > expect people to fix Polly when they have a seemingly unrelated change that > happens to break a Polly bot. > > > The eventual goal here is to have this technology in appropriate parts of > the main pipeline, and so the question here is not really about breaking a > "Polly bot", but just about a "bot" in general. I've given this question > some thought and I think it sits in a reasonable place in the risk-reward > space. The answer would be, yes, we'd need to treat this like any other > part of the pipeline. However, I believe that Polly has as many, or more, > active contributors than essentially any other individual part of the > mid-level optimizer or CodeGen. As a result, there will be people around in > many time zones to help with problems with Polly-related code. > > As far as I know, there were companies in the past that tried Polly > without a whole lot of prior experience. It would be great to hear what > the experience was before adopting Polly at a much larger scale. > > > I'm also interested, although I'll caution against over-interpreting any > evidence here (positive or negative). Before a few weeks ago, Polly didn't > effectively run in the pipeline after inlining, and so I doubt it would > have been much use outside of embedded environments (and maybe some HPC > environments) with straightforwardly-presented C code. It's only now that > this has been fixed that I find the possibility of integrating this in > production interesting. > > > That is a good point. There are also biases independent of past > experiences (for disclosure mine is (*) above). But I think it is objective > to say a Polly integration is a big piece to swallow.Your pro-Polly > argument lists a number of categories that I think could be reasoned about > individually and partly evaluated with a data-driven approach: > A) Architecture > - support for autoparallelism > - support for accelerators > - isl- rewrite? etc > ... > B) Modelling > - polyhedral model > - temporal locality > - spatial locality > … > C) Analysis/Optimizations > - Dependence Analysis > - Transformation effective/power (loop nests, quality of transformations, > #vectorizable loops etc) > > A) is mostly Polly independent (except for the isl question I guess). For > B and C performance/ compile-time /opt-viewer data on a decent/wide range > of benchmarks possibly at different optimization levels (O2, O3, LTO, PGO > etc and combinations) should provide data-driven insight into > costs/benefits. > > Cheers > Gerolf > > > > > > Thanks again, > Hal > > > Adam > > [1] http://lists.llvm.org/pipermail/llvm-dev/2015-November/092017.html > [2] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096266.html > > > > > * Sincerely, Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, > with feedback from several other active Polly developers) We thank the > numerous people who have contributed to the Polly infrastructure: Alexandre > Isoard, Andreas Simbuerger, Andy Gibbs, Annanay Agarwal, Armin > Groesslinger, Ajith Pandel, Baranidharan Mohan, Benjamin Kramer, Bill > Wendling, Chandler Carruth, Craig Topper, Chris Jenneisch, Christian > Bielert, Daniel Dunbar, Daniel Jasper, David Blaikie, David Peixotto, > Dmitry N. Mikushin, Duncan P. N. Exon Smith, Eli Friedman, Eugene Zelenko, > George Burgess IV, Hans Wennborg, Hongbin Zheng, Huihui Zhang, Jakub > Kuderski, Johannes Doerfert, Justin Bogner, Karthik Senthil, Logan Chien, > Lawrence Hu, Mandeep Singh Grang, Matt Arsenault, Matthew Simpson, Mehdi > Amini, Micah Villmow, Michael Kruse, Matthias Reisinger, Maximilian > Falkenstein, Nakamura Takumi, Nandini Singhal, Nicolas Bonfante, Patrik > Hägglund, Paul Robinson, Philip Pfaffe, Philipp Schaad, Peter Conn, Pratik > Bhatu, Rafael Espindola, Raghesh Aloor, Reid Kleckner, Roal Jordans, > Richard Membarth, Roman Gareev, Saleem Abdulrasool, Sameer Sahasrabuddhe, > Sanjoy Das, Sameer AbuAsal, Sam Novak, Sebastian Pop, Siddharth Bhat, > Singapuram Sanjay Srivallabh, Sumanth Gundapaneni, Sunil Srivastava, > Sylvestre Ledru, Star Tan, Tanya Lattner, Tim Shen, Tarun Ranjendran, > Theodoros Theodoridis, Utpal Bora, Wei Mi, Weiming Zhao, and Yabin Hu.* > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170913/5aa092a9/attachment.html>
Hal Finkel via llvm-dev
2017-Sep-13 11:43 UTC
[llvm-dev] [RFC] Polly Status and Integration
On 09/13/2017 02:16 AM, C Bergström wrote:> A completely non-technical point, but what's the current "polly" > license? Does integrating that code conflict in any way with the work > being done to relicense llvm?Good question. I discussed this explicitly with Tobias, and his general feeling is that relicensing isl again would be doable if necessary (we already did this once, to an MIT license, in order to enable better LLVM integration).> > Does adding polly expose any additional legal risks? Some people from > Reservoir labs have explicitly stated to me that some of their patents > target polyhedral optimizations. You should almost certainly review > their portfolio or contact them. > > If at some point someone wants to add real loop optimizations - will > there be a conflict?Can you define "real loop optimizations"?> > What's the DWARF look like after poly transformations?Right now, Polly essentially changes loop structures around existing basic blocks (so the debug info on the loop bodies should be okay). Like most of our other loop optimizations, induction variables don't fare very well (an area where improvement is generally needed).> > The talk about performance is pretty light - It would be good to get > something besides just a handful of spotlight known codes. Also code > size, compilation speed. etcThis is a good point, and more information should definitely be provided in this regard. The larger question, from my perspective, is will the infrastructure significantly help us get from where we are to where we want to be?> ------------ > flag bikeshed - If it's not ready for -O3 - create specific flags to > specific poly passes. Creating yet another micro flag like -O3poly > just doesn't make sense to me. (keep it simple.) When it's really > really ready, enable it with the rest of the loop heavy passes. >Regarding transformations, this is also my preference. We'll reach a much smaller audience if special flags are required. There are also two aspects of "ready" here: it might be ready as an analysis infrastructure well before we can enable loop restructuring by default. Thanks again, Hal> > > > On Wed, Sep 13, 2017 at 11:26 AM, Gerolf Hoflehner via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > >> On Sep 11, 2017, at 10:47 PM, Hal Finkel via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> On 09/11/2017 12:26 PM, Adam Nemet wrote: >>> Hi Hal, Tobias, Michael and others, >>> >>>> On Sep 1, 2017, at 11:47 AM, Hal Finkel via llvm-dev >>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> >>>> ** >>>> >>>> *Hi everyone,As you may know, stock LLVM does not provide the >>>> kind of advanced loop transformations necessary to provide good >>>> performance on many applications. LLVM's Polly project provides >>>> many of the required capabilities, including loop >>>> transformations such as fission, fusion, skewing, >>>> blocking/tiling, and interchange, all powered by >>>> state-of-the-art dependence analysis. Polly also provides >>>> automated parallelization and targeting of GPUs and >>>> other**accelerators.* >>>> * >>>> Over the past year, Polly’s development has focused on >>>> robustness, correctness, and closer integration with LLVM. To >>>> highlight a few accomplishments: >>>> >>>> * >>>> Polly now runs, by default, in the conceptually-proper >>>> place in LLVM’s pass pipeline (just before the loop >>>> vectorizer). Importantly, this means that its loop >>>> transformations are performed after inlining and other >>>> canonicalization, greatly increasing its robustness, and >>>> enabling its use on C++ code (where [] is often a function >>>> call before inlining). >>>> * >>>> Polly’s cost-modeling parameters, such as those describing >>>> the target’s memory hierarchy, are being integrated with >>>> TargetTransformInfo. This allows targets to properly >>>> override the modeling parameters and allows reuse of these >>>> parameters by other clients. >>>> * >>>> Polly’s method of handling signed division/remainder >>>> operations, which worked around lack of support in >>>> ScalarEvolution, is being replaced thanks to improvements >>>> being contributed to ScalarEvolution itself (see D34598). >>>> Polly’s core delinearization routines have long been a part >>>> of LLVM itself. >>>> * >>>> PolyhedralInfo, which exposes a subset of Polly’s loop >>>> analysis for use by other clients, is now available. >>>> * >>>> Polly is now part of the LLVM release process and is being >>>> included with LLVM by various packagers (e.g., Debian). >>>> >>>> >>>> I believe that the LLVM community would benefit from beginning >>>> the process of integrating Polly with LLVM itself and >>>> continuing its development as part of our main code base. This >>>> will: >>>> >>>> * >>>> Allow for wider adoption of LLVM within communities relying >>>> on advanced loop transformations. >>>> * >>>> Provide for better community feedback on, and testing of, >>>> the code developed (although the story in this regard is >>>> already fairly solid). >>>> * >>>> Better motivate targets to provide accurate, comprehensive, >>>> modeling parameters for use by advanced loop transformations. >>>> * >>>> Perhaps most importantly, this will allow us to develop and >>>> tune the rest of the optimizer assuming that Polly’s >>>> capabilities are present (the underlying analysis, and >>>> eventually, the transformations themselves). >>>> >>>> >>>> The largest issue on which community consensus is required, in >>>> order to move forward at all, is what to do with isl. isl, the >>>> Integer Set Library, provides core functionality on which Polly >>>> depends. It is a C library, and while some Polly/LLVM >>>> developers are also isl developers, it has a large user >>>> community outside of LLVM/Polly. A C++ interface was recently >>>> added, and Polly is transitioning to use the C++ interface. >>>> Nevertheless, options here include rewriting the needed >>>> functionality, forking isl and transitioning our fork toward >>>> LLVM coding conventions (and data structures) over time, and >>>> incorporating isl more-or-less as-is to avoid partitioning its >>>> development. >>>> >>>> That having been said, isl is internally modular, and >>>> regardless of the overall integration strategy, the Polly >>>> developers anticipate specializing, or even replacing, some of >>>> these components with LLVM-specific solutions. This is >>>> especially true for anything that touches performance-related >>>> heuristics and modeling. LLVM-specific, or even >>>> target-specific, loop schedulers may be developed as well. >>>> >>>> Even though some developers in the LLVM community already have >>>> a background in polyhedral-modeling techniques, the Polly >>>> developers have developed, and are still developing, extensive >>>> tutorials on this topic http://pollylabs.org/education.html >>>> <http://pollylabs.org/education.html>and especially >>>> http://playground.pollylabs.org >>>> <http://playground.pollylabs.org/>. >>>> Finally, let me highlight a few ongoing development efforts in >>>> Polly that are potentially relevant to this discussion. Polly’s >>>> loop analysis is sound and technically superior to what’s in >>>> LLVM currently (i.e. in LoopAccessAnalysis and >>>> DependenceAnalysis). There are, however, two known reasons why >>>> Polly’s transformations could not yet be enabled by default: >>>> >>>> * >>>> A correctness issue: Currently, Polly assumes that 64 bits >>>> is large enough for all new loop-induction variables and >>>> index expressions. In rare cases, transformations could be >>>> performed where more bits are required. Preconditions need >>>> to be generated preventing this (e.g., D35471). >>>> * >>>> A performance issue: Polly currently models temporal >>>> locality (i.e., it tries to get better reuse in time), but >>>> does not model spatial locality (i.e., it does not model >>>> cache-line reuse). As a result, it can sometimes introduce >>>> performance regressions. Polly Labs is currently working on >>>> integrating spatial locality modeling into the loop >>>> optimization model. >>>> >>>> Polly can already split apart basic blocks in order to >>>> implement loop fusion. Heuristics to choose at which >>>> granularity are still being implemented (e.g., PR12402). >>>> I believe that we can now develop a concrete plan for moving >>>> state-of-the-art loop optimizations, based on the technology in >>>> the Polly project, into LLVM. Doing so will enable LLVM to be >>>> competitive with proprietary compilers in high-performance >>>> computing, machine learning, and other important application >>>> domains. I’d like community feedback on what**should be part of >>>> that plan. >>>> * >>> One thing that I’d like to see more details on is what this >>> means for the evolution of loop transformations in LLVM. >>> Our more-or-less established direction was so far to >>> incrementally improve and generalize the required analyses (e.g. >>> the LoopVectorizer’s dependence analysis + loop versioning >>> analysis into a stand-alone analysis pass (LoopAccessAnalysis)) >>> and then build new transformations (e.g. LoopDistribution, >>> LoopLoadElimination, LICMLoopVersioning) on top of these. >>> The idea was that infrastructure would be incrementally improved >>> from two directions: >>> - As new transformations are built analyses have to be improved >>> (e.g. past improvements to LAA to support the LoopVersioning >>> utility, future improvements for full LoopSROA beyond just >>> store->load forwarding [1] or the improvements to LAA for the >>> LoopFusion proposal[2]) >>> - As more complex loops would have to be analyzed we either >>> improve LAA or make DependenceAnalysis a drop-in replacement for >>> the memory analysis part in LAA >> Or we could use Polly's dependence analysis, which I believe to >> be more powerful, more robust, and more correct than >> DependenceAnalysis. I believe that the difficult part here is >> actually the pairing with predicated SCEV or whatever mechanism >> we want to use generate runtime predicates (this applies to use >> of DependenceAnalysis too). > What is a good way to measure these assertions (More powerful, > more robust)? Are you saying the LLVM Dependence Analysis is > incorrect or do you actually mean less conservative (or "more > accurate" or something like that)? >>> While this model may be slow it has all the benefits of the >>> incremental development model. >> The current model may have been slow in many areas, but I think >> that's mostly a question of development effort. My largest >> concern about the current model is that, to the extent that we're >> implementing classic loop transformations (e.g., fusion, >> distribution, interchange, skewing, tiling, and so on), we're >> repeating a historical design that is known to have several >> suboptimal properties. Chief among them is the lack of >> integration: many of these transformations are interconnected, >> and there's no good pass ordering in which to make independent >> decisions. Many of these transformations can be captured in a >> single model and we can get much better results by integrating >> them. There's also the matter of whether building these >> transformation on SCEV (or IR directly) is the best underlying >> infrastructure, or whether parts of Polly would be better. > I believe that is true. What I wonder is is there a good method to > reason about it? Perhaps concrete examples or perhaps opt-viewer > based comparisons on large sets of benchmarks? In the big picture > you could make such a modeling argument for all compiler > optimizations. >> That having been said, I think that integrating this technology >> into LLVM will also mean applying appropriate modularity. I think >> that we'll almost definitely want to make use of the dependence >> analysis separately as an analysis. We'll want to decide which of >> these transformations will be considered canonicalization (and >> run in the iterative pipeline) and which will be lowering (and >> run near the vectorizer). LoopSROA certainly sounds to me like >> canonicalization, but loop fusion might also fall into that >> category (i.e., we might want to fuse early to enable >> optimizations and then split late). >>> Then there is the question of use cases. It’s fairly obvious >>> that anybody wanting to optimize a 5-deep highly regular >>> loop-nest operating on arrays should use Polly. On the other >>> hand it’s way less clear that we should use it for singly or >>> doubly nested not-so-regular loops which are the norm in non-HPC >>> workloads. >> This is clearly a good question, but thinking about Polly as a >> set of components, not as a monolithic transformation component, >> I think that polyhedral analysis and transformations can underlie >> a lot of the transformations we need for non-HPC code (and, which >> I'll point out, we need for modern HPC code too). In practice, >> the loops that we can actually analyze have affine dependencies, >> and Polly does, or can do, a better job at generating runtime >> predicates and dealing with piecewise-linear expressions than our >> current infrastructure. In short, I look at Polly as two things: >> First, an infrastructure for dealing with loop analysis and >> transformation. I view this as being broadly applicable. Second, >> an application of that to apply cost-model-driven classic loop >> transformations. To some extent this is going to be more useful >> for HPC codes, but also applies to machine learning, signal >> processing, graphics, and other areas. > I’m wondering if it could be used for pointing out headroom for > the existing LLVM ecosystem (*) >>> And this brings me to the maintenance question. Is it >>> reasonable to expect people to fix Polly when they have a >>> seemingly unrelated change that happens to break a Polly bot. >> The eventual goal here is to have this technology in appropriate >> parts of the main pipeline, and so the question here is not >> really about breaking a "Polly bot", but just about a "bot" in >> general. I've given this question some thought and I think it >> sits in a reasonable place in the risk-reward space. The answer >> would be, yes, we'd need to treat this like any other part of the >> pipeline. However, I believe that Polly has as many, or more, >> active contributors than essentially any other individual part of >> the mid-level optimizer or CodeGen. As a result, there will be >> people around in many time zones to help with problems with >> Polly-related code. >>> As far as I know, there were companies in the past that tried >>> Polly without a whole lot of prior experience. It would be >>> great to hear what the experience was before adopting Polly at a >>> much larger scale. >> I'm also interested, although I'll caution against >> over-interpreting any evidence here (positive or negative). >> Before a few weeks ago, Polly didn't effectively run in the >> pipeline after inlining, and so I doubt it would have been much >> use outside of embedded environments (and maybe some HPC >> environments) with straightforwardly-presented C code. It's only >> now that this has been fixed that I find the possibility of >> integrating this in production interesting. > That is a good point. There are also biases independent of past > experiences (for disclosure mine is (*) above). But I think it is > objective to say a Polly integration is a big piece to > swallow.Your pro-Polly argument lists a number of categories that > I think could be reasoned about individually and partly evaluated > with a data-driven approach: > A) Architecture > - support for autoparallelism > - support for accelerators > - isl- rewrite? etc > ... > B) Modelling > - polyhedral model > - temporal locality > - spatial locality > … > C) Analysis/Optimizations > - Dependence Analysis > - Transformation effective/power (loop nests, quality of > transformations, #vectorizable loops etc) > A) is mostly Polly independent (except for the isl question I > guess). For B and C performance/ compile-time /opt-viewer data on > a decent/wide range of benchmarks possibly at different > optimization levels (O2, O3, LTO, PGO etc and combinations) should > provide data-driven insight into costs/benefits. > Cheers > Gerolf >> Thanks again, Hal >>> Adam >>> [1] >>> http://lists.llvm.org/pipermail/llvm-dev/2015-November/092017.html >>> <http://lists.llvm.org/pipermail/llvm-dev/2015-November/092017.html> >>> [2] >>> http://lists.llvm.org/pipermail/llvm-dev/2016-March/096266.html >>> <http://lists.llvm.org/pipermail/llvm-dev/2016-March/096266.html> >>>> * >>>> Sincerely, >>>> Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, >>>> with feedback from**several other active Polly developers) >>>> We thank the numerous people who have contributed to the Polly >>>> infrastructure:Alexandre Isoard, Andreas Simbuerger, Andy >>>> Gibbs, Annanay Agarwal, ArminGroesslinger, Ajith Pandel, >>>> Baranidharan Mohan, Benjamin Kramer, BillWendling, Chandler >>>> Carruth, Craig Topper, Chris Jenneisch, ChristianBielert, >>>> Daniel Dunbar, Daniel Jasper, David Blaikie, David >>>> Peixotto,Dmitry N. Mikushin, Duncan P. N. Exon Smith, Eli >>>> Friedman, EugeneZelenko, George Burgess IV, Hans Wennborg, >>>> Hongbin Zheng, Huihui Zhang,Jakub Kuderski, Johannes Doerfert, >>>> Justin Bogner, Karthik Senthil, LoganChien, Lawrence Hu, >>>> Mandeep Singh Grang, Matt Arsenault, MatthewSimpson, Mehdi >>>> Amini, Micah Villmow, Michael Kruse, Matthias >>>> Reisinger,Maximilian Falkenstein, Nakamura Takumi, Nandini >>>> Singhal, NicolasBonfante, Patrik Hägglund, Paul Robinson, >>>> Philip Pfaffe, Philipp Schaad,Peter Conn, Pratik Bhatu, Rafael >>>> Espindola, Raghesh Aloor, ReidKleckner, Roal Jordans, Richard >>>> Membarth, Roman Gareev, SaleemAbdulrasool, Sameer >>>> Sahasrabuddhe, Sanjoy Das, Sameer AbuAsal, SamNovak, Sebastian >>>> Pop, Siddharth Bhat, Singapuram Sanjay Srivallabh,Sumanth >>>> Gundapaneni, Sunil Srivastava, Sylvestre Ledru, Star Tan, >>>> TanyaLattner, Tim Shen, Tarun Ranjendran, Theodoros >>>> Theodoridis, Utpal Bora,Wei Mi, Weiming Zhao, and Yabin Hu.* >>>> -- >>>> Hal Finkel >>>> Lead, Compiler Technology and Programming Languages >>>> Leadership Computing Facility >>>> Argonne National Laboratory >>>> _______________________________________________ LLVM Developers >>>> mailing list llvm-dev at lists.llvm.org >>>> <mailto:llvm-dev at lists.llvm.org> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> _______________________________________________ LLVM Developers >> mailing list llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > _______________________________________________ LLVM Developers > mailing list llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170913/ebfc0525/attachment-0001.html>
C Bergström via llvm-dev
2017-Sep-13 11:53 UTC
[llvm-dev] [RFC] Polly Status and Integration
On Wed, Sep 13, 2017 at 7:43 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 09/13/2017 02:16 AM, C Bergström wrote: > > A completely non-technical point, but what's the current "polly" license? > Does integrating that code conflict in any way with the work being done to > relicense llvm? > > > Good question. I discussed this explicitly with Tobias, and his general > feeling is that relicensing isl again would be doable if necessary (we > already did this once, to an MIT license, in order to enable better LLVM > integration). > > > Does adding polly expose any additional legal risks? Some people from > Reservoir labs have explicitly stated to me that some of their patents > target polyhedral optimizations. You should almost certainly review their > portfolio or contact them. > > If at some point someone wants to add real loop optimizations - will there > be a conflict? > > > Can you define "real loop optimizations"? >I think most readers here will understand what I mean. I can go find specific chapters of textbooks if it's unclear. Maybe the word "real" could be replaced with traditional, well tested, industry standard or something else. (ok I'll stop being snarky) I really do appreciate your feedback and I do think something beyond just a soft discussion is required on the IP/license vetting. The relicense process used before should be substantially similar to the process which LLVM is going to use. There's a big difference between someone randomly changing a license header and nobody complaining vs getting explicit and signed agreements from all copyright holders. Further, my reading on some of the patents causes significant concerns. (A point everyone will want to ignore until it's too late). I'm avoiding exact references, but soon I'll start I'll start listing exact patents if nobody else cares. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170913/24142313/attachment.html>
Tobias Grosser via llvm-dev
2017-Sep-13 12:26 UTC
[llvm-dev] [RFC] Polly Status and Integration
On Wed, Sep 13, 2017, at 13:43, Hal Finkel wrote:> > On 09/13/2017 02:16 AM, C Bergström wrote: > > A completely non-technical point, but what's the current "polly" > > license? Does integrating that code conflict in any way with the work > > being done to relicense llvm? > > Good question. I discussed this explicitly with Tobias, and his general > feeling is that relicensing isl again would be doable if necessary (we > already did this once, to an MIT license, in order to enable better LLVM > integration).Right. isl was relicensed to MIT following an advice from Chris Lattner. We got written consent from all contributors and copyright owners. If need arises -- we can look into this as part of the LLVM relicensing process -- with even better legal advice.> > Does adding polly expose any additional legal risks? Some people from > > Reservoir labs have explicitly stated to me that some of their patents > > target polyhedral optimizations. You should almost certainly review > > their portfolio or contact them. > > > > If at some point someone wants to add real loop optimizations - will > > there be a conflict? > > Can you define "real loop optimizations"? > > > > > What's the DWARF look like after poly transformations? > > Right now, Polly essentially changes loop structures around existing > basic blocks (so the debug info on the loop bodies should be okay). Like > most of our other loop optimizations, induction variables don't fare > very well (an area where improvement is generally needed).Right.> > The talk about performance is pretty light - It would be good to get > > something besides just a handful of spotlight known codes. Also code > > size, compilation speed. etc > > This is a good point, and more information should definitely be provided > in this regard. The larger question, from my perspective, is will the > infrastructure significantly help us get from where we are to where we > want to be? > > > ------------ > > flag bikeshed - If it's not ready for -O3 - create specific flags to > > specific poly passes. Creating yet another micro flag like -O3poly > > just doesn't make sense to me. (keep it simple.) When it's really > > really ready, enable it with the rest of the loop heavy passes. > > > > Regarding transformations, this is also my preference. We'll reach a > much smaller audience if special flags are required. There are also two > aspects of "ready" here: it might be ready as an analysis infrastructure > well before we can enable loop restructuring by default.Sure, that's the ultimate goal and clearly something we should shoot for "soon". Best, Tobias