On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:> Hi, Sean: > > I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change > to the IPO pass-ordering for now. However, when I make a minor change to > populateLTOPassManager() by separating module-pass and non-module-passes, I > saw quite a few performance difference, most of them are degradations. Attacking > these degradations one by one in a piecemeal manner is wasting time. We might as > well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time, > and hopefully once for all. > > In order to repair the image of being a liar, I post some preliminary result in this cozy > Saturday afternoon which I normally denote to daydreaming :-) > > So far I only measure the result of MultiSource benchmarks on my iMac (late > 2012 model), and the command to run the benchmark is > "make TEST=simple report OPTFLAGS='-O3 -flto'". > > In terms of execution-time, some degrade, but more improve, few of them > are quite substantial. User-time is used for comparison. I measure the > result twice, they are basically very stable. As far as I can tell from the result, > the proposed pass-ordering is basically toward good change. > > Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase > (see the patch) with original populateLTOPassManager() for both IPO and postIPO, > I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc" > (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance > to figure out why this combination improves this benchmark this much. > > In teams of compile-time, the result reports my change improve the compile > time by about 2x, which is non-sense. I guess test-script doesn't count > link-time. > > The new pass ordering Pre-IPO, IPO, and PostIPO are defined by > populate{PreIPO|IPO|PostIPO}PassMgr(). > > I will discuss with Andy next Monday in order to be consistent with the > pass-ordering design he is envisioning, and measure more benchmarks then > post the patch and result to the community for discussion and approval. > > Thanks > ShuxinI don't have any objection to this as long as your compile times are comparable. The major differences that I could spot are: You've moved the second iteration of some scalar opts into post-IPO: - JumpThreading - CorrelatedValueProp You no longer run InstCombine after the first round of scalar opts (in preIPO) and before the second round (in PostIPO). You now have an extra (3rd) SROA in PostIPO. I don't see a problem, but I'd like to understand the rationale. I think it would be valuable to capture some of the motivation behind the standard pass ordering and any changes we make to it. Sometimes part of the design becomes obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems? -Andy
On Mon, Jul 29, 2013 at 4:07 PM, Andrew Trick <atrick at apple.com> wrote:> > I don't see a problem, but I'd like to understand the rationale. I think > it would be valuable to capture some of the motivation behind the standard > pass ordering and any changes we make to it. Sometimes part of the design > becomes obsolete but no one can be sure. Shall we start a new doc under > LLVM subsystems? >Starting a new doc sounds like a good idea to me. If you aren't familiar with adding to the Sphinx docs, the sphinx quickstart template will get you up and running <http://llvm.org/docs/SphinxQuickstartTemplate.html>. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130729/335d6561/attachment.html>
----- Original Message -----> > On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com> > wrote: > > > Hi, Sean: > > > > I'm sorry I lie. I didn't mean to lie. I did try to avoid making > > a *BIG* change > > to the IPO pass-ordering for now. However, when I make a minor > > change to > > populateLTOPassManager() by separating module-pass and > > non-module-passes, I > > saw quite a few performance difference, most of them are > > degradations. Attacking > > these degradations one by one in a piecemeal manner is wasting > > time. We might as > > well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases > > at this time, > > and hopefully once for all. > > > > In order to repair the image of being a liar, I post some > > preliminary result in this cozy > > Saturday afternoon which I normally denote to daydreaming :-) > > > > So far I only measure the result of MultiSource benchmarks on my > > iMac (late > > 2012 model), and the command to run the benchmark is > > "make TEST=simple report OPTFLAGS='-O3 -flto'". > > > > In terms of execution-time, some degrade, but more improve, few of > > them > > are quite substantial. User-time is used for comparison. I measure > > the > > result twice, they are basically very stable. As far as I can tell > > from the result, > > the proposed pass-ordering is basically toward good change. > > > > Interesting enough, if I combine the populatePreIPOPassMgr() as > > the preIPO phase > > (see the patch) with original populateLTOPassManager() for both IPO > > and postIPO, > > I see significant improve to > > "Benchmarks/Trimaran/netbench-crc/netbench-crc" > > (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I > > have not yet got chance > > to figure out why this combination improves this benchmark this > > much. > > > > In teams of compile-time, the result reports my change improve the > > compile > > time by about 2x, which is non-sense. I guess test-script doesn't > > count > > link-time. > > > > The new pass ordering Pre-IPO, IPO, and PostIPO are defined by > > populate{PreIPO|IPO|PostIPO}PassMgr(). > > > > I will discuss with Andy next Monday in order to be consistent > > with the > > pass-ordering design he is envisioning, and measure more benchmarks > > then > > post the patch and result to the community for discussion and > > approval. > > > > Thanks > > Shuxin > > I don't have any objection to this as long as your compile times are > comparable. > > The major differences that I could spot are: > > You've moved the second iteration of some scalar opts into post-IPO: > - JumpThreading > - CorrelatedValueProp > > You no longer run InstCombine after the first round of scalar opts > (in preIPO) and before the second round (in PostIPO). > > You now have an extra (3rd) SROA in PostIPO. > > I don't see a problem, but I'd like to understand the rationale. I > think it would be valuable to capture some of the motivation behind > the standard pass ordering and any changes we make to it. Sometimes > part of the design becomes obsolete but no one can be sure.Out of curiosity, has anyone tried to optimize the pass ordering in some (quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for this. -Hal> Shall we > start a new doc under LLVM subsystems? > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On Mon, Jul 29, 2013 at 4:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > Out of curiosity, has anyone tried to optimize the pass ordering in some > (quasi-)automated way? Naively, a genetic algorithm seems like a perfect > fit for this. >This is the closest I've seen: http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/ However, it deals with a "toy" example. Doing something similar over an entire benchmark suite would be interesting (and it may find non-obvious, highly-profitable interactions between passes that we aren't currently exploiting). -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130729/59af46da/attachment.html>
On 7/29/13 4:07 PM, Andrew Trick wrote:> On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > >> Hi, Sean: >> >> I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change >> to the IPO pass-ordering for now. However, when I make a minor change to >> populateLTOPassManager() by separating module-pass and non-module-passes, I >> saw quite a few performance difference, most of them are degradations. Attacking >> these degradations one by one in a piecemeal manner is wasting time. We might as >> well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time, >> and hopefully once for all. >> >> In order to repair the image of being a liar, I post some preliminary result in this cozy >> Saturday afternoon which I normally denote to daydreaming :-) >> >> So far I only measure the result of MultiSource benchmarks on my iMac (late >> 2012 model), and the command to run the benchmark is >> "make TEST=simple report OPTFLAGS='-O3 -flto'". >> >> In terms of execution-time, some degrade, but more improve, few of them >> are quite substantial. User-time is used for comparison. I measure the >> result twice, they are basically very stable. As far as I can tell from the result, >> the proposed pass-ordering is basically toward good change. >> >> Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase >> (see the patch) with original populateLTOPassManager() for both IPO and postIPO, >> I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc" >> (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance >> to figure out why this combination improves this benchmark this much. >> >> In teams of compile-time, the result reports my change improve the compile >> time by about 2x, which is non-sense. I guess test-script doesn't count >> link-time. >> >> The new pass ordering Pre-IPO, IPO, and PostIPO are defined by >> populate{PreIPO|IPO|PostIPO}PassMgr(). >> >> I will discuss with Andy next Monday in order to be consistent with the >> pass-ordering design he is envisioning, and measure more benchmarks then >> post the patch and result to the community for discussion and approval. >> >> Thanks >> Shuxin > I don't have any objection to this as long as your compile times are comparable. > > The major differences that I could spot are: > > You've moved the second iteration of some scalar opts into post-IPO: > - JumpThreading > - CorrelatedValuePropI don't see why we need so many iterations. So, I get rid of it> > You no longer run InstCombine after the first round of scalar opts (in preIPO) and before the second round (in PostIPO). > > You now have an extra (3rd) SROA in PostIPO.I call the SROA for dead code elimination, seriously! The dead-whatever-elimination (even if they are called aggressive) pass dose not eliminate last store the local variable. Shame! Shame! Shame! It seems we don't have better way since we don't like mem-ssa. We have to call SROA , a all-in-one algorithm, to perform such stuff.> > I don't see a problem, but I'd like to understand the rationale. I think it would be valuable to capture some of the motivation behind the standard pass ordering and any changes we make to it. Sometimes part of the design becomes obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems? > > -Andy
I personally strong abhor this kind of thing:-) I guess I should be more open-minded. For pre-ipo phase, some passes should not invoke, say, any loop nest-opt, loop version, aggressive loop unrolling, vectorization, aggressive inling. The reasons are they will hinder the downstream optimizers if they kick in early.> Out of curiosity, has anyone tried to optimize the pass ordering in some (quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for this. > > -Hal > >> Shall we >> start a new doc under LLVM subsystems? >> >> -Andy >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
Maybe Matching Threads
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man