David Blaikie
2012-Sep-30 15:04 UTC
[LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
I'm not sure whether this is the exact problem at hand in your example, but one of the major hurdles llvm suffers when trying to devirtualize is the second point you made: it doesn't see the invariance of the table pointer post construction. In your specific example the constructors are trivial and inlinable so it I'm not sure why llvm would be having trouble proving the value of the vptr & devirtualizing. , perhaps due to them being static so the initialization is contingent on it being the first call (& llvm doesn't know that the vptr is constant after construction until destruction begins) & doesn't see the connection across all calls. So there are a few issues that need to be addressed to help this. One is some kind of constant range where the front end can promise not to modify certain values for a range (this might not be correct though - I remember some argument as to whether it was valid to destroy and then placement new the same type into that memory before the object goes out of scope - if so, then it's not obvious if the vptr is constant in any range) & secondly (at least for the case of non inline constructors) the ability to provide some kind of assert that, post construction, the vptr is equal to some known constant. (this assertion is currently possible with a branch to unreachable, except llvm throws out the unreachable code in SimplyCFG before any optimizations run - so we need to see if we can keep them around longer - there are some PRs filed for this but I haven't made much progress on it From: Matthieu Monrocq Sent: 9/29/2012 4:41 PM To: cfe-dev at cs.uiuc.edu; llvmdev Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM Hello, at the moment the devirtualization of calls is performed in Clang (as far as I understand) whilst inlining and constant propagation are the optimizer's (LLVM) job. It is probably necessary for Clang to perform "some" devirtualization (the meaning of `final` is not known to LLVM), however all the stuff to determine what the dynamic type of a variable is seems redundant with LLVM, and is incomplete (in a way) when it's not perform *after* inlining and constant propagation. It seems to me therefore that because of this we are missing optimization opportunities. Suppose the following example program: #include <cstdio> struct Base { virtual void foo() = 0; }; struct NothingDerived: Base { virtual void foo() {} }; struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } }; Base& select(int i) { static NothingDerived nd; static PrintDerived pd; if (i % 2 == 0) { return nd; } return pd; } int main() { Base& b = select(0); b.foo(); } Which gives the following main function (using Try out LLVM and Clang): define i32 @main() uwtable { [...] _Z6selecti.exit: ; preds = %13, %10, %7 %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived* @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8 %15 = load void (%struct.Base*)** %14, align 8 tail call void %15(%struct.Base* getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)) ret i32 0 } LLVM trivially sees through the select call and rightly deduce that we have to do with NothingDerived. However it does not go any step further and directly select NothingDerived::foo's function. Instead it dutifully performs all the bitcasting / pointer arithmetic necessary to access the pointer to function stored in the VTable and calls it through a pointer to function. I understand it would be awkward to have LLVM be aware of the virtual table implementation. Especially since even in C++ it varies from one implementation to another. However it seems to me that LLVM could still perform this optimization: - LLVM having deduced the exact value to use (select(int)::nd) should be able to get directly to its v-ptr (the first field of Base) %struct.NothingDerived = type { %struct.Base } %struct.Base = type { i32 (...)** } - the v-ptr (after construction) always point to the same v-table, which is a constant store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)*** getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0, i32 0), align 8 - the offset in the v-table is "static" getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0) - the v-table being constant, what is stored at that offset is perfectly deducible as well @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to i8*), i8* bitcast (void (%struct.NothingDerived*)* @_ZN14NothingDerived3fooEv to i8*)] So the question is, what is lacking for LLVM to perform this optimization ? - Is it because of the loss of information in having the v-table stored as a "blob" of bytes ? (which means that Clang should pass more typed information, without changing the exact layout obviously given the ABI constraints) - Or is it something internal to LLVM (the information is somehow irremediably lost) ? I admit that reducing the virtual call overhead is probably not really worth it (in general), however devirtualizing calls also expose more inlining/context opportunities and it's hard (for me) to quantify what such an optimization could bring here. We should also consider the simplification in Clang (and other frontends) if LLVM could perform the job on its own. -- Matthieu _______________________________________________ cfe-dev mailing list cfe-dev at cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Matthieu Monrocq
2012-Oct-03 19:01 UTC
[LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <dblaikie at gmail.com> wrote:> I'm not sure whether this is the exact problem at hand in your example, > but one of the major hurdles llvm suffers when trying to devirtualize > is the second point you made: it doesn't see the invariance of the > table pointer post construction. In your specific example the > constructors are trivial and inlinable so it I'm not sure why llvm > would be having trouble proving the value of the vptr & > devirtualizing. , perhaps due to them being static so the > initialization is contingent on it being the first call (& llvm doesn't > know that the vptr is constant after construction until destruction > begins) & doesn't see the connection across all calls. > > So there are a few issues that need to be addressed to help this. > > One is some kind of constant range where the front end can promise not > to modify certain values for a range (this might not be correct though > - I remember some argument as to whether it was valid to destroy and > then placement new the same type into that memory before the object > goes out of scope - if so, then it's not obvious if the vptr is > constant in any range) & secondly (at least for the case of non inline > constructors) the ability to provide some kind of assert that, post > construction, the vptr is equal to some known constant. (this assertion > is currently possible with a branch to unreachable, except llvm throws > out the unreachable code in SimplyCFG before any optimizations run - so > we need to see if we can keep them around longer - there are some PRs > filed for this but I haven't made much progress on it > From: Matthieu Monrocq > Sent: 9/29/2012 4:41 PM > To: cfe-dev at cs.uiuc.edu; llvmdev > Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM > Hello, > > at the moment the devirtualization of calls is performed in Clang (as > far as I understand) whilst inlining and constant propagation are the > optimizer's (LLVM) job. > > It is probably necessary for Clang to perform "some" devirtualization > (the meaning of `final` is not known to LLVM), however all the stuff > to determine what the dynamic type of a variable is seems redundant > with LLVM, and is incomplete (in a way) when it's not perform *after* > inlining and constant propagation. > > > It seems to me therefore that because of this we are missing > optimization opportunities. Suppose the following example program: > > #include <cstdio> > > struct Base { virtual void foo() = 0; }; > > struct NothingDerived: Base { virtual void foo() {} }; > > struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } }; > > Base& select(int i) { > static NothingDerived nd; > static PrintDerived pd; > if (i % 2 == 0) { return nd; } > return pd; > } > > int main() { > Base& b = select(0); > b.foo(); > } > > > Which gives the following main function (using Try out LLVM and Clang): > > define i32 @main() uwtable { > [...] > > _Z6selecti.exit: ; preds = %13, %10, %7 > %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived* > @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8 > %15 = load void (%struct.Base*)** %14, align 8 > tail call void %15(%struct.Base* getelementptr inbounds > (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)) > ret i32 0 > } > > > LLVM trivially sees through the select call and rightly deduce that we > have to do with NothingDerived. However it does not go any step > further and directly select NothingDerived::foo's function. Instead it > dutifully performs all the bitcasting / pointer arithmetic necessary > to access the pointer to function stored in the VTable and calls it > through a pointer to function. > > I understand it would be awkward to have LLVM be aware of the virtual > table implementation. Especially since even in C++ it varies from one > implementation to another. However it seems to me that LLVM could > still perform this optimization: > > - LLVM having deduced the exact value to use (select(int)::nd) should > be able to get directly to its v-ptr (the first field of Base) > > %struct.NothingDerived = type { %struct.Base } > %struct.Base = type { i32 (...)** } > > > - the v-ptr (after construction) always point to the same v-table, > which is a constant > > store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* > @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)*** > getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 > 0, i32 0, i32 0), align 8 > > > - the offset in the v-table is "static" > > getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0) > > > - the v-table being constant, what is stored at that offset is > perfectly deducible as well > > @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*] > [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to > i8*), i8* bitcast (void (%struct.NothingDerived*)* > @_ZN14NothingDerived3fooEv to i8*)] > > > So the question is, what is lacking for LLVM to perform this optimization ? > > - Is it because of the loss of information in having the v-table > stored as a "blob" of bytes ? (which means that Clang should pass more > typed information, without changing the exact layout obviously given > the ABI constraints) > > - Or is it something internal to LLVM (the information is somehow > irremediably lost) ? > > > I admit that reducing the virtual call overhead is probably not really > worth it (in general), however devirtualizing calls also expose more > inlining/context opportunities and it's hard (for me) to quantify what > such an optimization could bring here. We should also consider the > simplification in Clang (and other frontends) if LLVM could perform > the job on its own. > > -- Matthieu > _______________________________________________ > cfe-dev mailing list > cfe-dev at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-devHello David, I have been thinking a bit more about this and as I find the problem quite interesting. If I get down one level (and revert to implementing objects in C): #include <stdio.h> typedef void (*Function)(); void print() { printf("Hello, World!"); } void nothing() {} Function get(int i) { if (i % 2 == 0) { return &print; } return ¬hing; } int main() { Function f = get(2); (*f)(); return 0; } Which is admittedly quite similar, generates the following main: define i32 @main() nounwind uwtable { %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([14 x i8]* @.str, i64 0, i64 0)) nounwind ret i32 0 } This falls out naturally from SSA form, I think. No "constification" is necessary. However in our case, it seems that the SSA form (which only references the pointer to the structure itself, not the v-table pointer), is not sufficient therefore to allow the optimizer to remember which value the pointer should have. Of course, in general I agree that the compiler should be told explicitly that the value cannot change (otherwise it would not know that opaque calls are not accessing that particular value within the structure); at least until the start of the destructor. I remember a discussion a while ago involving the design and implementation of those "const-ranges" within the LLVM IR, do you happen to know where it's at ? -- Matthieu
David Blaikie
2012-Oct-03 20:30 UTC
[LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
On Wed, Oct 3, 2012 at 12:01 PM, Matthieu Monrocq <matthieu.monrocq at gmail.com> wrote:> On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <dblaikie at gmail.com> wrote: >> I'm not sure whether this is the exact problem at hand in your example, >> but one of the major hurdles llvm suffers when trying to devirtualize >> is the second point you made: it doesn't see the invariance of the >> table pointer post construction. In your specific example the >> constructors are trivial and inlinable so it I'm not sure why llvm >> would be having trouble proving the value of the vptr & >> devirtualizing. , perhaps due to them being static so the >> initialization is contingent on it being the first call (& llvm doesn't >> know that the vptr is constant after construction until destruction >> begins) & doesn't see the connection across all calls. >> >> So there are a few issues that need to be addressed to help this. >> >> One is some kind of constant range where the front end can promise not >> to modify certain values for a range (this might not be correct though >> - I remember some argument as to whether it was valid to destroy and >> then placement new the same type into that memory before the object >> goes out of scope - if so, then it's not obvious if the vptr is >> constant in any range) & secondly (at least for the case of non inline >> constructors) the ability to provide some kind of assert that, post >> construction, the vptr is equal to some known constant. (this assertion >> is currently possible with a branch to unreachable, except llvm throws >> out the unreachable code in SimplyCFG before any optimizations run - so >> we need to see if we can keep them around longer - there are some PRs >> filed for this but I haven't made much progress on it >> From: Matthieu Monrocq >> Sent: 9/29/2012 4:41 PM >> To: cfe-dev at cs.uiuc.edu; llvmdev >> Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM >> Hello, >> >> at the moment the devirtualization of calls is performed in Clang (as >> far as I understand) whilst inlining and constant propagation are the >> optimizer's (LLVM) job. >> >> It is probably necessary for Clang to perform "some" devirtualization >> (the meaning of `final` is not known to LLVM), however all the stuff >> to determine what the dynamic type of a variable is seems redundant >> with LLVM, and is incomplete (in a way) when it's not perform *after* >> inlining and constant propagation. >> >> >> It seems to me therefore that because of this we are missing >> optimization opportunities. Suppose the following example program: >> >> #include <cstdio> >> >> struct Base { virtual void foo() = 0; }; >> >> struct NothingDerived: Base { virtual void foo() {} }; >> >> struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } }; >> >> Base& select(int i) { >> static NothingDerived nd; >> static PrintDerived pd; >> if (i % 2 == 0) { return nd; } >> return pd; >> } >> >> int main() { >> Base& b = select(0); >> b.foo(); >> } >> >> >> Which gives the following main function (using Try out LLVM and Clang): >> >> define i32 @main() uwtable { >> [...] >> >> _Z6selecti.exit: ; preds = %13, %10, %7 >> %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived* >> @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8 >> %15 = load void (%struct.Base*)** %14, align 8 >> tail call void %15(%struct.Base* getelementptr inbounds >> (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)) >> ret i32 0 >> } >> >> >> LLVM trivially sees through the select call and rightly deduce that we >> have to do with NothingDerived. However it does not go any step >> further and directly select NothingDerived::foo's function. Instead it >> dutifully performs all the bitcasting / pointer arithmetic necessary >> to access the pointer to function stored in the VTable and calls it >> through a pointer to function. >> >> I understand it would be awkward to have LLVM be aware of the virtual >> table implementation. Especially since even in C++ it varies from one >> implementation to another. However it seems to me that LLVM could >> still perform this optimization: >> >> - LLVM having deduced the exact value to use (select(int)::nd) should >> be able to get directly to its v-ptr (the first field of Base) >> >> %struct.NothingDerived = type { %struct.Base } >> %struct.Base = type { i32 (...)** } >> >> >> - the v-ptr (after construction) always point to the same v-table, >> which is a constant >> >> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* >> @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)*** >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 >> 0, i32 0, i32 0), align 8 >> >> >> - the offset in the v-table is "static" >> >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0) >> >> >> - the v-table being constant, what is stored at that offset is >> perfectly deducible as well >> >> @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*] >> [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to >> i8*), i8* bitcast (void (%struct.NothingDerived*)* >> @_ZN14NothingDerived3fooEv to i8*)] >> >> >> So the question is, what is lacking for LLVM to perform this optimization ? >> >> - Is it because of the loss of information in having the v-table >> stored as a "blob" of bytes ? (which means that Clang should pass more >> typed information, without changing the exact layout obviously given >> the ABI constraints) >> >> - Or is it something internal to LLVM (the information is somehow >> irremediably lost) ? >> >> >> I admit that reducing the virtual call overhead is probably not really >> worth it (in general), however devirtualizing calls also expose more >> inlining/context opportunities and it's hard (for me) to quantify what >> such an optimization could bring here. We should also consider the >> simplification in Clang (and other frontends) if LLVM could perform >> the job on its own. >> >> -- Matthieu >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > Hello David, > > I have been thinking a bit more about this and as I find the problem > quite interesting. > > If I get down one level (and revert to implementing objects in C): > > #include <stdio.h> > > typedef void (*Function)(); > > void print() { printf("Hello, World!"); } > void nothing() {} > > Function get(int i) { > if (i % 2 == 0) { return &print; } > return ¬hing; > } > > int main() { > Function f = get(2); > (*f)(); > return 0; > } > > Which is admittedly quite similar, generates the following main: > > define i32 @main() nounwind uwtable { > %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds > ([14 x i8]* @.str, i64 0, i64 0)) nounwind > ret i32 0 > } > > This falls out naturally from SSA form, I think. No "constification" > is necessary. > > However in our case, it seems that the SSA form (which only references > the pointer to the structure itself, not the v-table pointer), is not > sufficient therefore to allow the optimizer to remember which value > the pointer should have. > > Of course, in general I agree that the compiler should be told > explicitly that the value cannot change (otherwise it would not know > that opaque calls are not accessing that particular value within the > structure); at least until the start of the destructor. > > I remember a discussion a while ago involving the design and > implementation of those "const-ranges" within the LLVM IR, do you > happen to know where it's at ?I'm not sure where that is, but I've +nlewycky because he's been thinking about this recently & there are some wrinkles as I alluded to in my reply. Saying that the vtable pointer is const from post-construction to pre-destruction isn't quite correct & would break certain obscure but valid programs. That being said, there is some kind of lifetime marker we could create & use to indicate the semantics of the vtable pointer without breaking those programs. (key here is that C++ cares about how you derive the pointer to the object, not just its value - if you explicitly destroy an object then placement new another object over the same space, the implementation is allowed to assume that the old pointers you had to the object still point to the same kind of object - but the one returned from placement new isn't, even though those pointers may have the same value... that's my understanding/vague description of the issue) - David
Possibly Parallel Threads
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM
- [LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM