Reid Spencer
2003-Nov-18 14:19 UTC
[LLVMdev] [Fwd: Optimization: Conclusions from Evolutionary Analysis]
I'm cross-posting the message below (from GCC list) because I believe it would (at some point) be very beneficial to build an evolutionary optimization pass into LLVM. The idea would be to discover the perfect set of optimizations for a given program by trying them all and analyzing the execution times of each. This would be somewhat like profile driven optimization except the profile is easier to gather: its the whole program timing. I imagine a "mega pass" could be written that tries all other passes in all their combinations to (eventually) arrive at the right set for a given program. There's one big issue with this: workload. Unlike profile-driven optimization, which can find parts of the program to optimize based on data collected from a run or several runs of the program, this evolutionary pass would need to run the exact same workload multiple times to discover which optimizations to apply. Obtaining a useful/typical workload for a given program will be difficult. Its easy for compilers and the like, but not all programs work that way (consider optimizing a web server this way). Probably the right approach is to punt on the workload issue leaving it to the programmer to invoke/drive the program in such a way that it finds a consistent workload to execute. I know its early in LLVM's life for something like this, but I thought I'd mention it so it can be tucked away in the "think about this later" file. Reid. -----Forwarded Message-----> From: Scott Robert Ladd <coyote at coyotegulch.com> > To: gcc mailing list <gcc at gcc.gnu.org> > Subject: Optimization: Conclusions from Evolutionary Analysis > Date: Tue, 18 Nov 2003 10:11:55 -0500 > > For those who haven't the time to read my full article, I thought I'd > pass along the conclusions section. One of my goals with Acovea was to > provide some empirical evidence of how optimizations behave, and to > dispell some common misconceptions about optimization. > > The settings for -O1, -O2, and -O3 are valid. Every option implied by > -O3 may not be applicable to every program -- but enabling all those > options does not seem to be detrimental, except in the case of > huffbench. Several of the recently-added flags (-ftracer in particular) > may be candidates for inclusion in -O2 or -O3 for future versions of the > compiler. > > Processor-specific options do not appear to be a major factor in > performance on these benchmarks. I don't know if this is due to the > nature of the processor, or if GCC can't take advantage of > processor-specific instructions. I have double-checked my results; > adding -mfpmath=sse (or any of its variants, or -msse) to a compile of > almabench does not make the code run any faster. The only ia32-specific > option that showed consistent value was -momit-leaf-frame-pointer. > > The genetic algorithm was able to find sets of flags that produced > faster code than that emitted by the default -O1, -O2, and -O3 options > (with the exception of almabench, which is largely unoptimizable by > nature). In many cases, this was due to the inclusion of "new" flags > introduced with more recent version of GCC. Acovea discovers additional > options that improve performance over that provided by the default > settings; in essence, the genetic algorithm determines which of the > non-standard and processor- specific options are most effective for a > given algorithm. > > A well-designed program will encapsulate algorithms. For most programs, > performance is predicated on specific algorithms that can be identified > via profiling; Acovea can then be applied to those algorithms for > finding the set of optimizations that produce the fastest code. Only > rarely does an entire program need to be fully optimized; instead, > optimization should be applied to specific, critical routines that > perform concise tasks. > > The full article can be found at: > > http://www.coyotegulch.com/acovea/index.html_______________________ Reid Spencer President & CTO eXtensible Systems, Inc. rspencer at x10sys.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20031118/c4e2caf8/attachment.sig>
Vikram Adve
2003-Nov-18 17:12 UTC
[LLVMdev] [Fwd: Optimization: Conclusions from Evolutionary Analysis]
This is a hot topic in the compiler research community, but the focus there is on (a) choosing the right optimization sequences internally and transparently, rather than through combinations of options, (b) performance prediction techniques so you don't actually have to run gazillion different choices, and perhaps can even avoid the problem of choosing representative inputs, as you talked about, and (c) developing flexible internal representations so that you can dynamically generate appropriate code sequences. We are actually involved in a project with IBM that aims to look at all of these issues. Needless to say, I'm hoping that they will use LLVM as one of the components of the infrastructure for this work :) They do seem interested. --Vikram http://www.cs.uiuc.edu/~vadve http://llvm.cs.uiuc.edu/ On Nov 18, 2003, at 2:18 PM, Reid Spencer wrote:> I'm cross-posting the message below (from GCC list) because I believe > it > would (at some point) be very beneficial to build an evolutionary > optimization pass into LLVM. The idea would be to discover the perfect > set of optimizations for a given program by trying them all and > analyzing the execution times of each. This would be somewhat like > profile driven optimization except the profile is easier to gather: its > the whole program timing. I imagine a "mega pass" could be written that > tries all other passes in all their combinations to (eventually) arrive > at the right set for a given program. There's one big issue with this: > workload. Unlike profile-driven optimization, which can find parts of > the program to optimize based on data collected from a run or several > runs of the program, this evolutionary pass would need to run the exact > same workload multiple times to discover which optimizations to apply. > Obtaining a useful/typical workload for a given program will be > difficult. Its easy for compilers and the like, but not all programs > work that way (consider optimizing a web server this way). Probably the > right approach is to punt on the workload issue leaving it to the > programmer to invoke/drive the program in such a way that it finds a > consistent workload to execute. > > I know its early in LLVM's life for something like this, but I thought > I'd mention it so it can be tucked away in the "think about this later" > file. > > Reid. > > -----Forwarded Message----- >> From: Scott Robert Ladd <coyote at coyotegulch.com> >> To: gcc mailing list <gcc at gcc.gnu.org> >> Subject: Optimization: Conclusions from Evolutionary Analysis >> Date: Tue, 18 Nov 2003 10:11:55 -0500 >> >> For those who haven't the time to read my full article, I thought I'd >> pass along the conclusions section. One of my goals with Acovea was to >> provide some empirical evidence of how optimizations behave, and to >> dispell some common misconceptions about optimization. >> >> The settings for -O1, -O2, and -O3 are valid. Every option implied by >> -O3 may not be applicable to every program -- but enabling all those >> options does not seem to be detrimental, except in the case of >> huffbench. Several of the recently-added flags (-ftracer in >> particular) >> may be candidates for inclusion in -O2 or -O3 for future versions of >> the >> compiler. >> >> Processor-specific options do not appear to be a major factor in >> performance on these benchmarks. I don't know if this is due to the >> nature of the processor, or if GCC can't take advantage of >> processor-specific instructions. I have double-checked my results; >> adding -mfpmath=sse (or any of its variants, or -msse) to a compile of >> almabench does not make the code run any faster. The only >> ia32-specific >> option that showed consistent value was -momit-leaf-frame-pointer. >> >> The genetic algorithm was able to find sets of flags that produced >> faster code than that emitted by the default -O1, -O2, and -O3 options >> (with the exception of almabench, which is largely unoptimizable by >> nature). In many cases, this was due to the inclusion of "new" flags >> introduced with more recent version of GCC. Acovea discovers >> additional >> options that improve performance over that provided by the default >> settings; in essence, the genetic algorithm determines which of the >> non-standard and processor- specific options are most effective for a >> given algorithm. >> >> A well-designed program will encapsulate algorithms. For most >> programs, >> performance is predicated on specific algorithms that can be >> identified >> via profiling; Acovea can then be applied to those algorithms for >> finding the set of optimizations that produce the fastest code. Only >> rarely does an entire program need to be fully optimized; instead, >> optimization should be applied to specific, critical routines that >> perform concise tasks. >> >> The full article can be found at: >> >> http://www.coyotegulch.com/acovea/index.html > > > _______________________ > Reid Spencer > President & CTO > eXtensible Systems, Inc. > rspencer at x10sys.com
Reid Spencer
2003-Nov-19 14:06 UTC
[LLVMdev] [Fwd: Optimization: Conclusions from Evolutionary Analysis]
On Tue, 2003-11-18 at 15:11, Vikram Adve wrote:> This is a hot topic in the compiler research community, but the focus > there is on > (a) choosing the right optimization sequences internally and > transparently, rather than through combinations of options, > (b) performance prediction techniques so you don't actually have to run > gazillion different choices, and perhaps can even avoid the problem of > choosing representative inputs, as you talked about, andMy experience with performance prediction (over a 20 year career) is that its useless. The simulation world has long sought to be able to predict outcomes, performance, scalability, etc. I haven't seen a model yet that gets lower than a +/-50% error rate. In the general case, the only *reliable* way to do it is through collecting actual data. Of course, in the specific case of compiler output, it might be possible to get higher levels of accuracy. It'll be interesting to follow this research.> (c) developing flexible internal representations so that you can > dynamically generate appropriate code sequences.Now THAT I agree with :)> We are actually involved in a project with IBM that aims to look at all > of these issues. Needless to say, I'm hoping that they will use LLVM > as one of the components of the infrastructure for this work :) They > do seem interested.I hope so too .. great validation. Reid.> On Nov 18, 2003, at 2:18 PM, Reid Spencer wrote: > > > I'm cross-posting the message below (from GCC list) because I believe > > it > > would (at some point) be very beneficial to build an evolutionary > > optimization pass into LLVM. The idea would be to discover the perfect > > set of optimizations for a given program by trying them all and > > analyzing the execution times of each. This would be somewhat like > > profile driven optimization except the profile is easier to gather: its > > the whole program timing. I imagine a "mega pass" could be written that > > tries all other passes in all their combinations to (eventually) arrive > > at the right set for a given program. There's one big issue with this: > > workload. Unlike profile-driven optimization, which can find parts of > > the program to optimize based on data collected from a run or several > > runs of the program, this evolutionary pass would need to run the exact > > same workload multiple times to discover which optimizations to apply. > > Obtaining a useful/typical workload for a given program will be > > difficult. Its easy for compilers and the like, but not all programs > > work that way (consider optimizing a web server this way). Probably the > > right approach is to punt on the workload issue leaving it to the > > programmer to invoke/drive the program in such a way that it finds a > > consistent workload to execute. > > > > I know its early in LLVM's life for something like this, but I thought > > I'd mention it so it can be tucked away in the "think about this later" > > file. > > > > Reid. > > > > -----Forwarded Message----- > >> From: Scott Robert Ladd <coyote at coyotegulch.com> > >> To: gcc mailing list <gcc at gcc.gnu.org> > >> Subject: Optimization: Conclusions from Evolutionary Analysis > >> Date: Tue, 18 Nov 2003 10:11:55 -0500 > >> > >> For those who haven't the time to read my full article, I thought I'd > >> pass along the conclusions section. One of my goals with Acovea was to > >> provide some empirical evidence of how optimizations behave, and to > >> dispell some common misconceptions about optimization. > >> > >> The settings for -O1, -O2, and -O3 are valid. Every option implied by > >> -O3 may not be applicable to every program -- but enabling all those > >> options does not seem to be detrimental, except in the case of > >> huffbench. Several of the recently-added flags (-ftracer in > >> particular) > >> may be candidates for inclusion in -O2 or -O3 for future versions of > >> the > >> compiler. > >> > >> Processor-specific options do not appear to be a major factor in > >> performance on these benchmarks. I don't know if this is due to the > >> nature of the processor, or if GCC can't take advantage of > >> processor-specific instructions. I have double-checked my results; > >> adding -mfpmath=sse (or any of its variants, or -msse) to a compile of > >> almabench does not make the code run any faster. The only > >> ia32-specific > >> option that showed consistent value was -momit-leaf-frame-pointer. > >> > >> The genetic algorithm was able to find sets of flags that produced > >> faster code than that emitted by the default -O1, -O2, and -O3 options > >> (with the exception of almabench, which is largely unoptimizable by > >> nature). In many cases, this was due to the inclusion of "new" flags > >> introduced with more recent version of GCC. Acovea discovers > >> additional > >> options that improve performance over that provided by the default > >> settings; in essence, the genetic algorithm determines which of the > >> non-standard and processor- specific options are most effective for a > >> given algorithm. > >> > >> A well-designed program will encapsulate algorithms. For most > >> programs, > >> performance is predicated on specific algorithms that can be > >> identified > >> via profiling; Acovea can then be applied to those algorithms for > >> finding the set of optimizations that produce the fastest code. Only > >> rarely does an entire program need to be fully optimized; instead, > >> optimization should be applied to specific, critical routines that > >> perform concise tasks. > >> > >> The full article can be found at: > >> > >> http://www.coyotegulch.com/acovea/index.html > > > > > > _______________________ > > Reid Spencer > > President & CTO > > eXtensible Systems, Inc. > > rspencer at x10sys.com > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev_______________________ Reid Spencer President & CTO eXtensible Systems, Inc. rspencer at x10sys.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20031119/df8a5603/attachment.sig>