henry miller
2013-Jan-14 12:27 UTC
[LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
As a user I like this it is hard to understand what each level does. I know projects that are using O3 because 'more must be better'. I don't know how to explain that it might be slower, and measuring performance is tricky. (Many programs do multpile things, and spend most of their time waiting on the event loop) Would it be unreasonable to ask for a new/seperate set of optimizations: optimize debug. This would apple agressive optimizations, but not "significantly" changing the order of the code. I don't know the optimizer, but I know as a user of compilers that minimal optimization is often the difference between painfully slow program execution and okay performance. However debugging optimized programs can be difficult because the debugger jumps all over making the problem hard to understand. I'll leave it to experts to debate shades. Chandler Carruth <chandlerc at gmail.com> wrote:>This has been an idea floating around in my head for a while and after >several discussions with others it continues to hold up so I thought I >would mail it out. Sorry for cross posting to both lists, but this is >an >issue that would significantly impact both LLVM and Clang. > >Essentially, LLVM provides canned optimization "levels" for frontends >to >re-use. This is nothing new. However, we don't have good names for >them, we >don't expose them at the IR level, and we have a hard time figuring out >which optimizations belong in which levels. I'd like to try addressing >that >by coming up with names and a description of the basic intend goal of >each >level. I would like, if folks are happy with these ideas, to add these >types of descriptions along side these attributes to the langref. Ideas >on >other (better?) places to document this would be welcome. Certainly, >Clang's documentation would need to be updated to reflect this. > >Hopefully we can minimally debate this until the bikeshed is a >tolerable >shade. Note that I'm absolutely biased based on the behavior of Clang >and >GCC with these optimization levels, and the relevant history there. >However, I'm adding and deviating from the purely historical >differences to >try and better model the latest developments in LLVM's optimizer... So >here >goes: > > >1) The easiest: 'Minimize Size' or '-Oz' >- Attribute: minsize (we already have it, nothing to do here) >- Goal: minimize the size of the resulting binary, at (nearly) any >cost. > > >2) Optimize for size or '-Os' >- Attribute: optsize (we already have it, nothing to do here) >- Goal: Optimize the execution of the binary without unreasonably[1] >increasing the binary size. >This one is a bit fuzzy, but usually people don't have a hard time >figuring >out where the line is. The primary difference between minsize and >optsize >is that with minsize a pass is free to *hurt* performance to shrink the >size. > >[1] The definition of 'unreasonable' is of course subjective, but here >is >at least one strong indicator: any code size growth which is inherently >*speculative* (that is, there isn't a known, demonstrable performance >benefit, but rather it is "often" or "maybe" a benefit) is unlikely to >be a >good fit in optsize. The canonical example IMO is a vectorizer -- while >it >is reasonable to vectorize a loop, if the vector version might not be >executed, and thus the scalar loop remains as well, then it is a poor >fit >for optsize. > > >3) Optimize quickly or '-O1' >- Attribute: quickopt (this would be a new attribute) >- Goal: Perform basic optimizations to improve both performance and >simplicity of the code, but perform them *quickly*. >This level is all about compile time, but in a holistic sense. It tries >to >perform basic optimizations to get reasonably efficient code, and get >it >very quickly. > > >4) Good, well-balanced optimizations, or '-O2' >- Attribute: opt (new attribute) >- Goal: produce a well optimized binary trading off compile time, >space, >and runtime efficiency. >This should be an excellent default for general purpose programs. The >idea >is to do as much optimization as we can, in as reasonable of a time >frame, >and with as reasonable code size impact as possible. This level should >always produce binaries at least as fast as optsize, but they might be >both >bigger and faster. This level should always produce binaries at least >as >fast as quickopt, but they might be both slower to compile. > > >5) Optimize to the max or '-O3' >- Attribute: maxopt (new attribute) >- Goal: produce the fastest binary possible. >This level has historically been almost exclusively about trading off >more >binary size for speed than '-O2', but I would propose we change it to >be >more about trading off either binary size or compilation time to >achieve a >better performing binary. This level should always produce binaries at >least as fast as opt, but they might be faster at the cost of them >being >larger and taking more time to compile. This would in some cases be a >change for LLVM and is definitely a deviation from GCC where O3 will in >many cases produce *slower* binaries due to code size increases that >are >not accompanied by corresponding performance increases. > > >To go with these LLVM attributes I'd like to add support for adding >attributes in Clang, both compatible with GCC and with the names above >for >clarity. The goal being to allow a specific function to have its >optimization level overridden from the command line based level. > > >A final note: I would like to remove all other variations on the '-O' >flag. >That includes the really strange '-O4' behavior. Whether the >compilation is >LTO should be an orthogonal decision to the particular level of >optimization, and we have -flto to achieve this. > >-Chandler > > >------------------------------------------------------------------------ > >_______________________________________________ >cfe-dev mailing list >cfe-dev at cs.uiuc.edu >http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev-- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130114/7352fbf4/attachment.html>
dag at cray.com
2013-Jan-14 18:38 UTC
[LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
Renato Golin Linaro <renato.golin at linaro.org> writes:> I don't think you really need a performing debug image, though.Oh, you'd be surprised. We have hit many cases where debugging a completely unoptimized program is so slow it's not worth doing. Think about large parallel codes with millions of threads running on hundreds of thousands of sockets. Debugging even minimally optimized code can be a huge productivity win. Plus optimization tends to expose things like race conditions in parallel codes. -David
Maybe Matching Threads
- [LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
- [LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
- [LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
- [LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang
- [LLVMdev] [cfe-dev] RFC: Codifying (but not formalizing) the optimization levels in LLVM and Clang