I asked this question before, but wasn't satisfied with answers. How can (expert) users control inlining in llvm? gcc has these parameters: -finline-limit, --param max-inline-insns-single, --param inline-unit-growth, etc. What are the llvm equivalents? While running large and complex industrial processes, I found that inlining can significantly change the speed of individual processes. Usually the more inlining there is, the faster the process runs. In gcc I actually was setting insanely high inlining values because that's what usually gave the fastest code, even though there should be some limit after which speed should theoretically degrade. Now it looks like there are no llvm equivalents. Answers given before are these: compiler should know better how to inline, users should trust compiler to "do the right thing", such options might not be stable between versions, compiler should just be faster without any such flags, many users will be tempted to set some values they don't understand and they will be floating in their makefiles forever without meaning. For someone who is after the wall clock time such answers are naive. Compiler can't predict what heuristics the resulting code will exhibit under particular conditions. Maybe I want to inline 2X or 5X more than -O3 allows, and I am willing to spend this CPU time on compile and see. There are customers for which 10% improvement means a lot of difference. Why does llvm take away such choice from the users? This lack of inlining tuning variables is a sticking point for me in switching to clang. Yuri
I am at similar juncture and plan to tweak -inline-threshold for a start. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Yuri Sent: Tuesday, July 01, 2014 12:00 AM To: LLVM Developers Mailing List Subject: [LLVMdev] How to control inlining in llvm? I asked this question before, but wasn't satisfied with answers. How can (expert) users control inlining in llvm? gcc has these parameters: -finline-limit, --param max-inline-insns-single, --param inline-unit-growth, etc. What are the llvm equivalents? While running large and complex industrial processes, I found that inlining can significantly change the speed of individual processes. Usually the more inlining there is, the faster the process runs. In gcc I actually was setting insanely high inlining values because that's what usually gave the fastest code, even though there should be some limit after which speed should theoretically degrade. Now it looks like there are no llvm equivalents. Answers given before are these: compiler should know better how to inline, users should trust compiler to "do the right thing", such options might not be stable between versions, compiler should just be faster without any such flags, many users will be tempted to set some values they don't understand and they will be floating in their makefiles forever without meaning. For someone who is after the wall clock time such answers are naive. Compiler can't predict what heuristics the resulting code will exhibit under particular conditions. Maybe I want to inline 2X or 5X more than -O3 allows, and I am willing to spend this CPU time on compile and see. There are customers for which 10% improvement means a lot of difference. Why does llvm take away such choice from the users? This lack of inlining tuning variables is a sticking point for me in switching to clang. Yuri _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
In general llvm based tools have a -help-hidden option which will show you all parameters that can be tuned. opt -help-hidden has the following: -inline-threshold=<int> - Control the amount of inlining to perform (default = 225) -inlinehint-threshold=<int> - Threshold for inlining functions with inline hint clang allows you to pass options through opt by using -mllvm -option I guess you're looking for something like: clang -mllvm -inline-threshold=100000 ... Cheers, Roel On 30/06/14 20:30, Yuri wrote:> I asked this question before, but wasn't satisfied with answers. > > How can (expert) users control inlining in llvm? gcc has these > parameters: -finline-limit, --param max-inline-insns-single, --param > inline-unit-growth, etc. What are the llvm equivalents? > > While running large and complex industrial processes, I found that > inlining can significantly change the speed of individual processes. > Usually the more inlining there is, the faster the process runs. In gcc > I actually was setting insanely high inlining values because that's what > usually gave the fastest code, even though there should be some limit > after which speed should theoretically degrade. > > Now it looks like there are no llvm equivalents. > > Answers given before are these: compiler should know better how to > inline, users should trust compiler to "do the right thing", such > options might not be stable between versions, compiler should just be > faster without any such flags, many users will be tempted to set some > values they don't understand and they will be floating in their > makefiles forever without meaning. > > For someone who is after the wall clock time such answers are naive. > Compiler can't predict what heuristics the resulting code will exhibit > under particular conditions. Maybe I want to inline 2X or 5X more than > -O3 allows, and I am willing to spend this CPU time on compile and see. > There are customers for which 10% improvement means a lot of difference. > Why does llvm take away such choice from the users? > > This lack of inlining tuning variables is a sticking point for me in > switching to clang. > > Yuri > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Mon, Jun 30, 2014 at 11:30:29AM -0700, Yuri wrote:> While running large and complex industrial processes, I found that inlining > can significantly change the speed of individual processes. Usually the more > inlining there is, the faster the process runs. In gcc I actually was > setting insanely high inlining values because that's what usually gave the > fastest code, even though there should be some limit after which speed > should theoretically degrade. > > Now it looks like there are no llvm equivalents.You can change the inline threshold with -inline-threshold. In LLVM there is a heuristic that, for each function call, computes a guess of how expensive inlining this function will be (in terms of code size etc.). If the cost is smaller than the threshold, the function will get inlined.> For someone who is after the wall clock time such answers are naive. > Compiler can't predict what heuristics the resulting code will exhibit under > particular conditions. Maybe I want to inline 2X or 5X more than -O3 allows, > and I am willing to spend this CPU time on compile and see. There are > customers for which 10% improvement means a lot of difference. Why does llvm > take away such choice from the users?There's a lot of fine-grained control over inlining. Individual functions can be marked with function attributes [1]: alwaysinline and inlinehint. In clang, you can mark functions with __attribute__((always_inline)) and they will be inlined regardless of global settings and optimization level. I think GCC also supports that attribute. I think that clang will mark functions declared as "inline" in C with an inlinehint attribute. Inlining of these functions can be controlled with a separate threshold "-inlinehint-threshold" which can be bigger than the main threshold. [1] http://llvm.org/docs/LangRef.html#function-attributes> This lack of inlining tuning variables is a sticking point for me in > switching to clang.You can start with experimenting with: clang <args> -mllvm -inline-threshold=<n> The greater the parameter <n>, the more agressive the inlining will be. Default is 225 so set it to something bigger. Expect big code size and long compilation times with very agressive inlining. When you hit the point of diminishing returns you can try profiling the code and looking for frequently called but uninlined functions and try marking them with __attribute__((always_inline)) for even more inlining. If you have functions marked "inline", you can also experiment with -inlinehint-threshold bigger than -inline-threshold and see whether this changes anything. Also, are you compiling with link-time optimizations? Without them inlining is limited to individual compilation units. Regards, Tomasz D.