Andrea_DiBiagio at sn.scee.net
2013-Apr-24 13:00 UTC
[LLVMdev] [PROPOSAL] per-function optimization level control
Hello, We've had a high priority feature request from a number of our customers to provide per-function optimization in our Clang/LLVM compiler. I would be interested in working with the community to implement this. The idea is to allow the optimization level to be overridden for specific functions. The rest of this proposal is organized as follows: - Section 1. describes this new feature and explains why and when per-function optimization options are useful; - Sections 2. and 3. describe how the optimizer could be adapted/changed to allow the definition of per-function optimizations; - Section 4. tries to outline a possible workflow for implementing this new feature. I am looking for any feedback or suggestions etc. Thanks! Andrea Di Biagio SN Systems Ltd. http://www.snsys.com 1. Description =============The idea is to add pragmas to control the optimization level on functions. A similar approach has been implemented by GCC as well. Since GCC 4.4, new function specific option pragmas have been added to allow users to set the optimization level on a per function basis. http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html describes the pragmas as #pragma GCC optimize ("string") #pragma GCC push_options #pragma GCC pop_options #pragma GCC reset_options Instead of imitating GCC's syntax, I think it would be better to use a syntax consistent with existing pragma clang diagnostics: #pragma clang optimize push #pragma clang optimize "string" #pragma clang optimize pop Each directive would have its own stack, which in my opinion keeps everything more modular and simpler to implement. #pragma clang optimize push #pragma clang optimize pop A "optimize push" will temporary push the current set of optimization options while a "optimize pop" could be used to pop back to the previous set optimization options. #pragma clang optimize "string" This pragma allows to override the optimization level on functions defined later in the source code. Argument "string" is a string that begins with 'O' and it is assumed to be an optimization level (examples: "O0" for optimization level 0; "O1" for optimization level 1). In the future we may also extend the set of accepted strings in input to allow other codegen options to be overridden for specific functions. Example: //// #pragma clang optimize push #pragma clang optimize "O0" void f1() { ... } #pragma clang optimize push #pragma clang optimize "O2" void f2() { ... } void f3() { ... } #pragma clang optimize pop void f4() { ... } #pragma clang optimize pop //// Optimization level for f1 and f4 is -O0. Optimization level for f2 and f3 is -O2. 1.1 Why it is useful to define per-function optimization levels ==============================================================The main motivation of our customers is to be able to selectively disable optimizations when debugging one function in a compilation unit, in the case where compiling the whole unit at -O0 would make the program run too slowly. Being able to set the optimization level on a per function basis can also help in those cases where we know that there is a problem in an optimization but for some reasons either a) we don't know which optimization is performing the wrong transformation or b) we know the problematic Pass, however there is not an easy way to workaround the problem and fixing it would take too much time or c) there is an unknown error in the code being compiled that only causes problems when optimized (example: the code breaks strict aliasing). If we know that the bug only affects few functions in the code, we could think of disabling optimizations for those functions only. This would allow us to provide quick workarounds to customers encountering optimization bugs. 2. CHANGES REQUIRED IN clang ===========================Clang must be able to parse the new "pragma clang optimize". The idea is that optimization levels would be codified as IR attributes to functions. A discussion on how to codify the optimization levels in LLVM was originally started by Chandler here: lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html 3. CHANGES REQUIRED IN LLVM ==========================The global optimization level strongly affects how Passes are added to PassManagers. Example: When the global optimization level is -O0, method PassManagerBuilder::populateModulePassManager [in lib/Transforms/IPO/PassManagerBuilder.cpp] populates the per-module pass manager with the following passes: - AlwaysInliner (if inlining is not disabled) - extra Passes which may have been registered as extensions "to be enabled at optimization level 0". With an optimization level bigger than zero however several analysis and transform passes are potentially added to the "per-module" pass manager. The major problem with this approach is that both the optimizer and the backend work under the assumption that the set of codegen options is the same for all modules and functions. This also means that the sequence of passes to run is fixed at each optimization level and cannot be dynamically changed or adapted. If a FunctionPass is scheduled for running then it will be always run on all functions in the code (i.e. there is no way to control which passes to run on a per-function basis). One solution to allow the definition of optimization levels on a per-function basis is to implement a "common" pipeline of passes for all optimization levels. Rather than statically composing the sequence of passes to run, we could instead teach pass managers how to dynamically select which passes to run based on the knowledge of pass constraints. A pass constraint could be used to specify at which optimization levels it is safe to run the pass. Constraints on passes could be made available for example through the global PassRegistry, in which case the pass managers would then be able to query the registry to obtain the constraints. In conclusion, we could teach PassManagers how to retrieve constraints on passes and which passes to run taking into account both: - the information stored on Pass Constraints and - the optimization level associated to single functions (if available); 3.1 How pass constraints can be used to select passes to run ------------------------------------------------------------ A pass with no constraints can always be run at any optimization level. A Pass P is run by a PassManager if and only if its constraints match the "effective" optimization level (see below the definition of effective optimization level). By default the effective optimization level for all passes is equal to the global optimization level (i.e. the command line based optimization level). The effective optimization level for a Pass running on a function F (or a basic block BB) is the optimization level overridden by F (or by the function containing BB). If F does not specify any optimization level then the effective optimization level is set equal to the global optimization level. It is the responsibility of the pass manager to check the effective optimization level for all passes with a registered set of constraints. Example: -------- The following sequence of passes are given: A,B,C,D,E. Pass constraints are: 1. A is only run at OptLevel == 0 2. B is only run at OptLevel > 0 3. D is only run at OptLevel > 1 Given the following scenario where: - the global optimization level is set equal to 2 and - there are two IR functions, namely Fun1 and Fun2, where: * Fun1 does not override the default optimization level; * Fun2 overrides the optimization level to -O0; * Fun3 overrides the optimization level to -O1. The table below describes the relationship between functions and passes that are expected to be run on them. Boxes with an 'X' in them represent the pass being allowed to run on the function. \ A B C D E +---+---+---+---+---+ Fun1 | | X | X | X | X | +---+---+---+---+---+ Fun2 | X | | X | | X | +---+---+---+---+---+ Fun3 | | X | X | | X | +---+---+---+---+---+ In the case of Fun1, the effective optimization level is equal to the global optimization level (i.e. 2). Therefore the PassManager will skip pass A and run passes B,C,D,E on it. In the case of Fun2, the effective optimization level is set equal to 0 since Fun2 overrides it. The Pass Manager will therefore run Passes A,C,E on it. In the case of Fun3, the PassManager will run B,C,E. 3.2 How to deal with size levels -------------------------------- By default, clang sets the optimization level to 2 when either option "-Os" or "-Oz" is specified. See for example in clang how function `getOptimizationLevel' is implemented (in File lib/Frontend/CompilerInvocation.cpp). This is also true for the 'opt' tool but not for bugpoint which only accepts options -O1, -O2, -O3 to control the optimization level. In addition to "-Os" and "-Oz" clang also accepts option "-O". By default "-O" has the effect of setting the optimization level to 2. Internally, clang differentiates between optimization level and "size level". Option "-Os" has the effect of setting the SizeLevel to 1, while option "-Oz" has the effect of setting the SizeLevel to 2. Pass Constraints should allow the definition of constraints on both the optimization level and the size level. The effective optimization level described in 3.1 used by the pass managers must take into account both the optimization and the size level. 3.3 How Pass Constraints could be implemented --------------------------------------------- Constraints on the optimization level could be implement as pairs of values of the form of (minOptLevel,maxOptLevel), where: - minOptLevel is the minimum allowed optimization level; - maxOptLevel is the maximum allowed optimization level. Similarly, constraints on the size level could be implemented as pairs of values of the form (minSizeLevel,maxSizeLevel). Examples: A Pass with optimization constraints (0,0) is a Pass that can only be run at -O0 while a Pass with optimization constraints (1,MAXOPTLEVEL) is a Pass that can only be run at optimization level >=1. More than one set of constraints can be registered for each pass. Example, a Pass with optimization constraints (2,2) and size constraints (1,1) is a Pass that can only be run at -Os (since "-Os" sets respectively the optimization level to 2 and the size level to 1). 3.4 About the inlining strategy ------------------------------- At the current state there are two strategies available in LLVM for function inlining: 1) Inline Always (by default only used at -O0 and -O1); 2) Inline Simple (OptLevel >= 2). The Inline Always strategy can be used in place of the Inline Simple if specifically requested by the user. The constructor of SimpleInliner (see "lib/Transform/IPO/InlineSimple.cpp") requires that we pass a Threshold value as an argument to the constructor. In general, the threshold would be set by the front-end (it could be either clang or bugpoint or opt etc.) according to both the OptLevel and the SizeLevel. In order to support per-function optimizations, we should modify the existing SimpleInliner to allow adapting the Threshold dynamically based on changes in the effective optimization level. As a future develelopment, we might allow setting the inlining threshold using the optimize pragma. 3.5 Backend changes ------------------- Code generator passes would benefit from the same changes described in Section 3. A MachineFunctionPass is also a FunctionPass, which means that it should always be possible to specify optimization constraints for it. Class TargetPassConfig (see "include/CodeGen/Passes.h") provides several methods for populating the pass manager with common CodeGen passes. It is the responsibility of each target to override the default behavior for some of the methods exposed by the TargetPassConfig interface. Unfortunately changing how code generator passes are added to pass managers require that we potentially make changes on target specific parts of the backend. Examples: file "Target/X86/X86TargetMachine.cpp"; file "Target/Sparc/SparcTargetMachine.cpp"; file "Target/PowerPC/PPCTargetMachine.cpp" etc. In general, changes are required in every place in the backend where decisions are made based on the optimization level. More specifically, changes are required in the following components: 1. Instruction Selector: -- Use the effective optimization level to decide whether FastISel should be enable/disable; 2. Register Allocator: -- Select the register allocation strategy based on the effective optimization level; 3. CodeGen Passes whose behavior is affected by the global optimization Level: -- TwoAddressInstructionPass (lib/CodeGen/TwoAddressInstructionPass.cpp) -- PostRASchedulerList (lib/CodeGen/PostRASchedulerList.cpp) 4. Proposed Implementation Workflow ==================================The proposed work is: 1. Add support for modeling constraints on Passes: - The idea is to support constraints on optimization levels. In future we could think of adding support for constraints on other codegen options using the same framework; 2. Add support for registering constraints on passes into the PassRegistry; 3. Teach Pass Managers how to identify passes which are safe to be run; 4. Adapt the existing SimpleInliner Algorithm (or add a new algorithm); 5. Teach both the optimizer and backend how to register constraints on passes; 6. Define (or use the existing) IR attributes to decorate functions with optimization levels. 7. Teach Clang how to parse the new #pragma optimize and also how to emit IR attributes for controlling the optimization level on functions. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses. Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793 ********************************************************************** P Please consider the environment before printing this e-mail
Richard Smith
2013-Apr-24 14:50 UTC
[LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control
On Wed, Apr 24, 2013 at 6:00 AM, <Andrea_DiBiagio at sn.scee.net> wrote:> Hello, > > We've had a high priority feature request from a number of our customers > to > provide per-function optimization in our Clang/LLVM compiler. > I would be interested in working with the community to implement this. > The idea is to allow the optimization level to be overridden > for specific functions. > > The rest of this proposal is organized as follows: > - Section 1. describes this new feature and explains why and when > per-function optimization options are useful; > - Sections 2. and 3. describe how the optimizer could be adapted/changed > to allow the definition of per-function optimizations; > - Section 4. tries to outline a possible workflow for implementing this > new feature. > > I am looking for any feedback or suggestions etc. > > Thanks! > Andrea Di Biagio > SN Systems Ltd. > http://www.snsys.com > > 1. Description > =============> The idea is to add pragmas to control the optimization level on functions. > > A similar approach has been implemented by GCC as well. > Since GCC 4.4, new function specific option pragmas have been added to > allow > users to set the optimization level on a per function basis. > > http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html > describes the pragmas as > #pragma GCC optimize ("string") > #pragma GCC push_options > #pragma GCC pop_options > #pragma GCC reset_options > > Instead of imitating GCC's syntax, I think it would be better to use a > syntax > consistent with existing pragma clang diagnostics: > > #pragma clang optimize push > #pragma clang optimize "string" > #pragma clang optimize pop >Since the intent is to provide overrides on a per-function basis, have you considered using a function attribute instead of a pragma? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/31ce6658/attachment.html>
Eric Christopher
2013-Apr-24 14:55 UTC
[LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control
Especially since we have support for per function code gen attributes now. -eric On Wed, Apr 24, 2013 at 3:50 PM, Richard Smith <richard at metafoo.co.uk> wrote:> On Wed, Apr 24, 2013 at 6:00 AM, <Andrea_DiBiagio at sn.scee.net> wrote: >> >> Hello, >> >> We've had a high priority feature request from a number of our customers >> to >> provide per-function optimization in our Clang/LLVM compiler. >> I would be interested in working with the community to implement this. >> The idea is to allow the optimization level to be overridden >> for specific functions. >> >> The rest of this proposal is organized as follows: >> - Section 1. describes this new feature and explains why and when >> per-function optimization options are useful; >> - Sections 2. and 3. describe how the optimizer could be adapted/changed >> to allow the definition of per-function optimizations; >> - Section 4. tries to outline a possible workflow for implementing this >> new feature. >> >> I am looking for any feedback or suggestions etc. >> >> Thanks! >> Andrea Di Biagio >> SN Systems Ltd. >> http://www.snsys.com >> >> 1. Description >> =============>> The idea is to add pragmas to control the optimization level on functions. >> >> A similar approach has been implemented by GCC as well. >> Since GCC 4.4, new function specific option pragmas have been added to >> allow >> users to set the optimization level on a per function basis. >> >> http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html >> describes the pragmas as >> #pragma GCC optimize ("string") >> #pragma GCC push_options >> #pragma GCC pop_options >> #pragma GCC reset_options >> >> Instead of imitating GCC's syntax, I think it would be better to use a >> syntax >> consistent with existing pragma clang diagnostics: >> >> #pragma clang optimize push >> #pragma clang optimize "string" >> #pragma clang optimize pop > > > Since the intent is to provide overrides on a per-function basis, have you > considered using a function attribute instead of a pragma? > > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >
Renato Golin
2013-Apr-24 20:19 UTC
[LLVMdev] [PROPOSAL] per-function optimization level control
On 24 April 2013 14:00, <Andrea_DiBiagio at sn.scee.net> wrote:> In conclusion, we could teach PassManagers how to retrieve constraints on > passes and which passes to run taking into account both: > - the information stored on Pass Constraints and > - the optimization level associated to single functions (if available); >I like this approach. Today, the way to know which passes are added is to look at the functions and follow the branches for O1, O2, etc. Your proposal is way cleaner and allows for a table-based approach. It also makes it simpler to experiment with passes in different optimization levels on randomized benchmarks. I often tried to comment passes to identify bugs (that bugpoint wouldn't) and realized that it could generate many segmentation faults in the compiler, which is worrying...> 3.1 How pass constraints can be used to select passes to run > ------------------------------------------------------------ > It is the responsibility of the pass manager to check the effective > optimization level for all passes with a registered set of constraints. >There is a catch here. Passes generally have unwritten dependencies which you cannot tell just by looking at the code. Things like "run DCE after PassFoo only if state of variable Bar is Baz" can sometimes only be found out by going back on the commits that introduced them and finding that they were indeed, introduced together and it's not just an artefact of code movement elsewhere. The table I refer above would have to have the dependencies (backwards and forwards) with possible condition code (a virtual method) to define if it has to pass or not, based on some context, in addition to which optimization levels they should run. In theory, having that, would be just a matter of listing all passes for O-N which nobody depends on and follow all the dependencies to get the list of passes on the PassManager. Removing a pass from the O3 level would have to remove all orphaned passes that it would create, too. Just like Linux package management. ;)> Pass Constraints should allow the definition of constraints on both > the optimization level and the size level. >Yes, AND to run, OR to not run.> In order to support per-function optimizations, we should modify the > existing SimpleInliner to allow adapting the Threshold dynamically based > on changes in the effective optimization level. >This is a can of worms. A few years back, when writing our front-end we figured that since there weren't tests on inline thresholds of any other value than the hard-coded one, anything outside a small range around the hard-coded values would create codegen problems, segfaults, etc. It could be much better now, but I doubt it's well tested yet.> As a future develelopment, we might allow setting the inlining threshold > using the optimize pragma. >This, again, would be good to write randomized tests. But before we have some coverage, I wouldn't venture on doing that in real code. Unfortunately changing how code generator passes are added to pass> managers require that we potentially make changes on target specific parts > of the > backend. >Shouldn't be too hard, but you'll have to look closely if there is any back-end that depends on optimization levels to define other specific properties (cascading dependencies). 4. Proposed Implementation Workflow>I think your workflow makes sense, and I agree that this is a nice feature (for many uses). Thanks for looking into this! cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/f1913e8f/attachment.html>
Maybe Matching Threads
- [LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control
- [LLVMdev] [PROPOSAL] per-function optimization level control
- [LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control
- [LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control
- [LLVMdev] [cfe-dev] [PROPOSAL] per-function optimization level control