Tobias Grosser wrote:> On 04/17/2013 05:53 PM, Sebastian Pop wrote: > >Hi, > > > >polly is run very early and schedules the following passes before it runs: > > > >/// @brief Schedule a set of canonicalization passes to prepare for Polly > >/// > >/// The set of optimization passes was partially taken/copied from the > >/// set of default optimization passes in LLVM. It is used to bring the code > >/// into a canonical form that simplifies the analysis and optimization passes > >/// of Polly. The set of optimization passes scheduled here is probably not yet > >/// optimal. TODO: Optimize the set of canonicalization passes. > >static void registerCanonicalicationPasses(llvm::PassManagerBase &PM) { > > PM.add(llvm::createPromoteMemoryToRegisterPass()); > > PM.add(llvm::createInstructionCombiningPass()); > > PM.add(llvm::createCFGSimplificationPass()); > > PM.add(llvm::createTailCallEliminationPass()); > > PM.add(llvm::createCFGSimplificationPass()); > > PM.add(llvm::createReassociatePass()); > > PM.add(llvm::createLoopRotatePass()); > > PM.add(llvm::createInstructionCombiningPass()); > > > > if (!SCEVCodegen) > > PM.add(polly::createIndVarSimplifyPass()); > > > > PM.add(polly::createCodePreparationPass()); > > PM.add(polly::createRegionSimplifyPass()); > > Right. > > >Sergei was saying that on some benchmarks PromoteMemoryToRegister was causing > >performance regressions when it is run with and without Polly and scheduled that > >early. > > Are you saying these passes add compile time overhead or rather that > they cause problems with the performance of the compiled binary?Sergei was looking at the performance of the generated code (not compile time), and yes he looked at the impact of -O3 with the pre-passes of Polly as scheduled now vs. plain -O3.> This means the IR needs to be canonicalized enough to allow scalar > evolution & Co to work.Right, Sergei has also pointed out that PromoteMemoryToRegister is needed that early because otherwise SCEV would not be able to recognize induction variables allocated on the stack. If we schedule polly in the LNO, this constraint would be satisfied.> 2) The IR generated by Polly, should be well optimized through LLVM > > This means we do not only need to perform the optimizations that > would have been necessary for the input code, but we also want to > take advantage of optimization opportunities that show up after > Polly regenerated code. > > When I generated the pass ordering, I did not spend a large amount > of time to minimize it. I rather assumed, that to be sure the > LLVM-IR is well optimized after Polly, it would be good to just run > all passes LLVM passes over the output of Polly. So I just placed > Polly at the very beginning. Now, to enable Polly to detect > reasonably sized scops, I scheduled a set of canonicalization passes > before Polly (taken from the beginning of the -O3 sequence). > > In terms of scop coverage and quality of the generated code this > seems to be a good choice, but it obviously will increase the > compile time compared to a run without Polly. What we could aim for > is to run Polly at the beginning of the loop transformations e.g. by > adding an extension point 'EP_LoopOptimizerStart'. Meaning before > vectorization, before loop invariant code motion and before the loop > idiom recognition. However, we would then need to evaluate what > cleanup passes we need to run after Polly. For the classical code > generation strategy we probably need a couple of scalar cleanups, > with the scev based code generation, there is normally a lot less to > do. >Right: let's try to see whether with SCEVcodegen we can have better performance when scheduling Polly in the LNO. Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 04/17/2013 08:37 PM, Sebastian Pop wrote:> Tobias Grosser wrote: >> On 04/17/2013 05:53 PM, Sebastian Pop wrote: >>> Hi, >>> >>> polly is run very early and schedules the following passes before it runs: >>> >>> /// @brief Schedule a set of canonicalization passes to prepare for Polly >>> /// >>> /// The set of optimization passes was partially taken/copied from the >>> /// set of default optimization passes in LLVM. It is used to bring the code >>> /// into a canonical form that simplifies the analysis and optimization passes >>> /// of Polly. The set of optimization passes scheduled here is probably not yet >>> /// optimal. TODO: Optimize the set of canonicalization passes. >>> static void registerCanonicalicationPasses(llvm::PassManagerBase &PM) { >>> PM.add(llvm::createPromoteMemoryToRegisterPass()); >>> PM.add(llvm::createInstructionCombiningPass()); >>> PM.add(llvm::createCFGSimplificationPass()); >>> PM.add(llvm::createTailCallEliminationPass()); >>> PM.add(llvm::createCFGSimplificationPass()); >>> PM.add(llvm::createReassociatePass()); >>> PM.add(llvm::createLoopRotatePass()); >>> PM.add(llvm::createInstructionCombiningPass()); >>> >>> if (!SCEVCodegen) >>> PM.add(polly::createIndVarSimplifyPass()); >>> >>> PM.add(polly::createCodePreparationPass()); >>> PM.add(polly::createRegionSimplifyPass()); >> >> Right. >> >>> Sergei was saying that on some benchmarks PromoteMemoryToRegister was causing >>> performance regressions when it is run with and without Polly and scheduled that >>> early. >> >> Are you saying these passes add compile time overhead or rather that >> they cause problems with the performance of the compiled binary? > > Sergei was looking at the performance of the generated code (not compile time), > and yes he looked at the impact of -O3 with the pre-passes of Polly as scheduled > now vs. plain -O3.I am a little confused here. You are saying running -mem2reg decreases the quality of the generated code when run twice (once in the normal LLVM sequence and once before?). I can not really see how -mem2reg could have any negative impact. Do you have any idea where this may come from?>> 2) The IR generated by Polly, should be well optimized through LLVM >> >> This means we do not only need to perform the optimizations that >> would have been necessary for the input code, but we also want to >> take advantage of optimization opportunities that show up after >> Polly regenerated code. >> >> When I generated the pass ordering, I did not spend a large amount >> of time to minimize it. I rather assumed, that to be sure the >> LLVM-IR is well optimized after Polly, it would be good to just run >> all passes LLVM passes over the output of Polly. So I just placed >> Polly at the very beginning. Now, to enable Polly to detect >> reasonably sized scops, I scheduled a set of canonicalization passes >> before Polly (taken from the beginning of the -O3 sequence). >> >> In terms of scop coverage and quality of the generated code this >> seems to be a good choice, but it obviously will increase the >> compile time compared to a run without Polly. What we could aim for >> is to run Polly at the beginning of the loop transformations e.g. by >> adding an extension point 'EP_LoopOptimizerStart'. Meaning before >> vectorization, before loop invariant code motion and before the loop >> idiom recognition. However, we would then need to evaluate what >> cleanup passes we need to run after Polly. For the classical code >> generation strategy we probably need a couple of scalar cleanups, >> with the scev based code generation, there is normally a lot less to >> do. >> > > Right: let's try to see whether with SCEVcodegen we can have better performance > when scheduling Polly in the LNO.We can probably just schedule it there and then guess a reasonable set of cleanup passes around Polly in the case when SCEVcodegen is disabled. It seems we agree the right location is somewhere in the LNO. As said before, we could probably add it in between those two passes: MPM.add(createReassociatePass()); // Reassociate expressions + addExtensionsToPM(EP_LoopOptimizerStart, MPM); MPM.add(createLoopRotatePass()); // Rotate Loop Given that Hal feels it is save to have a couple of passes between Polly and the loop vectorizer. Tobias Tobi
Tobias Grosser wrote:> As said before, we could probably add it in between those two passes: > > MPM.add(createReassociatePass()); // Reassociate expressions > + addExtensionsToPM(EP_LoopOptimizerStart, MPM); > MPM.add(createLoopRotatePass()); // Rotate LoopAs this is in the middle of other LNO passes, can you please rename s/EP_LoopOptimizerStart/EP_Polly_LNO/ or anything other than LoopOptimizerStart? Thanks, Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation