Matthieu Monrocq
2012-Sep-29 13:39 UTC
[LLVMdev] Inlining and virtualization in Clang/LLVM
Hello, at the moment the devirtualization of calls is performed in Clang (as far as I understand) whilst inlining and constant propagation are the optimizer's (LLVM) job. It is probably necessary for Clang to perform "some" devirtualization (the meaning of `final` is not known to LLVM), however all the stuff to determine what the dynamic type of a variable is seems redundant with LLVM, and is incomplete (in a way) when it's not perform *after* inlining and constant propagation. It seems to me therefore that because of this we are missing optimization opportunities. Suppose the following example program: #include <cstdio> struct Base { virtual void foo() = 0; }; struct NothingDerived: Base { virtual void foo() {} }; struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } }; Base& select(int i) { static NothingDerived nd; static PrintDerived pd; if (i % 2 == 0) { return nd; } return pd; } int main() { Base& b = select(0); b.foo(); } Which gives the following main function (using Try out LLVM and Clang): define i32 @main() uwtable { [...] _Z6selecti.exit: ; preds = %13, %10, %7 %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived* @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8 %15 = load void (%struct.Base*)** %14, align 8 tail call void %15(%struct.Base* getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)) ret i32 0 } LLVM trivially sees through the select call and rightly deduce that we have to do with NothingDerived. However it does not go any step further and directly select NothingDerived::foo's function. Instead it dutifully performs all the bitcasting / pointer arithmetic necessary to access the pointer to function stored in the VTable and calls it through a pointer to function. I understand it would be awkward to have LLVM be aware of the virtual table implementation. Especially since even in C++ it varies from one implementation to another. However it seems to me that LLVM could still perform this optimization: - LLVM having deduced the exact value to use (select(int)::nd) should be able to get directly to its v-ptr (the first field of Base) %struct.NothingDerived = type { %struct.Base } %struct.Base = type { i32 (...)** } - the v-ptr (after construction) always point to the same v-table, which is a constant store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)*** getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0, i32 0), align 8 - the offset in the v-table is "static" getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0) - the v-table being constant, what is stored at that offset is perfectly deducible as well @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to i8*), i8* bitcast (void (%struct.NothingDerived*)* @_ZN14NothingDerived3fooEv to i8*)] So the question is, what is lacking for LLVM to perform this optimization ? - Is it because of the loss of information in having the v-table stored as a "blob" of bytes ? (which means that Clang should pass more typed information, without changing the exact layout obviously given the ABI constraints) - Or is it something internal to LLVM (the information is somehow irremediably lost) ? I admit that reducing the virtual call overhead is probably not really worth it (in general), however devirtualizing calls also expose more inlining/context opportunities and it's hard (for me) to quantify what such an optimization could bring here. We should also consider the simplification in Clang (and other frontends) if LLVM could perform the job on its own. -- Matthieu
Krzysztof Parzyszek
2012-Sep-30 03:15 UTC
[LLVMdev] Inlining and virtualization in Clang/LLVM
On 9/29/2012 8:39 AM, Matthieu Monrocq wrote:> > I admit that reducing the virtual call overhead is probably not really > worth it (in general), however devirtualizing calls also expose more > inlining/context opportunities and it's hard (for me) to quantify what > such an optimization could bring here.More precise aliasing information, eliminating the need for a thunk (or likes of it), removing the unpredicability of the indirect branch, etc. If you can get rid of an indirect branch in favor of a direct one, it's a win. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Under -rtti you may be able to see most of the relevant vtable structure if not all (I haven't verified this for clang/llvm though). You can piggyback on that for devirt. Even when -nortti is provided you can force a -rtti, do your devirt and then drop the rtti info. However, this technique is patented. - dibyendu ----- Original Message ----- From: Matthieu Monrocq [mailto:matthieu.monrocq at gmail.com] Sent: Saturday, September 29, 2012 08:39 AM To: cfe-dev at cs.uiuc.edu <cfe-dev at cs.uiuc.edu>; llvmdev <llvmdev at cs.uiuc.edu> Subject: [LLVMdev] Inlining and virtualization in Clang/LLVM Hello, at the moment the devirtualization of calls is performed in Clang (as far as I understand) whilst inlining and constant propagation are the optimizer's (LLVM) job. It is probably necessary for Clang to perform "some" devirtualization (the meaning of `final` is not known to LLVM), however all the stuff to determine what the dynamic type of a variable is seems redundant with LLVM, and is incomplete (in a way) when it's not perform *after* inlining and constant propagation. It seems to me therefore that because of this we are missing optimization opportunities. Suppose the following example program: #include <cstdio> struct Base { virtual void foo() = 0; }; struct NothingDerived: Base { virtual void foo() {} }; struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } }; Base& select(int i) { static NothingDerived nd; static PrintDerived pd; if (i % 2 == 0) { return nd; } return pd; } int main() { Base& b = select(0); b.foo(); } Which gives the following main function (using Try out LLVM and Clang): define i32 @main() uwtable { [...] _Z6selecti.exit: ; preds = %13, %10, %7 %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived* @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8 %15 = load void (%struct.Base*)** %14, align 8 tail call void %15(%struct.Base* getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)) ret i32 0 } LLVM trivially sees through the select call and rightly deduce that we have to do with NothingDerived. However it does not go any step further and directly select NothingDerived::foo's function. Instead it dutifully performs all the bitcasting / pointer arithmetic necessary to access the pointer to function stored in the VTable and calls it through a pointer to function. I understand it would be awkward to have LLVM be aware of the virtual table implementation. Especially since even in C++ it varies from one implementation to another. However it seems to me that LLVM could still perform this optimization: - LLVM having deduced the exact value to use (select(int)::nd) should be able to get directly to its v-ptr (the first field of Base) %struct.NothingDerived = type { %struct.Base } %struct.Base = type { i32 (...)** } - the v-ptr (after construction) always point to the same v-table, which is a constant store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)*** getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0, i32 0), align 8 - the offset in the v-table is "static" getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0) - the v-table being constant, what is stored at that offset is perfectly deducible as well @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to i8*), i8* bitcast (void (%struct.NothingDerived*)* @_ZN14NothingDerived3fooEv to i8*)] So the question is, what is lacking for LLVM to perform this optimization ? - Is it because of the loss of information in having the v-table stored as a "blob" of bytes ? (which means that Clang should pass more typed information, without changing the exact layout obviously given the ABI constraints) - Or is it something internal to LLVM (the information is somehow irremediably lost) ? I admit that reducing the virtual call overhead is probably not really worth it (in general), however devirtualizing calls also expose more inlining/context opportunities and it's hard (for me) to quantify what such an optimization could bring here. We should also consider the simplification in Clang (and other frontends) if LLVM could perform the job on its own. -- Matthieu _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Apparently Analagous Threads
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM