On Mon, Oct 11, 2010 at 12:30 PM, John McCall <rjmccall at apple.com> wrote:> On Oct 11, 2010, at 9:12 AM, Kenneth Uildriks wrote: >> 3. The front-end, recognizing that scribbling on an instance's vtbl >> pointer has undefined results, eliminated the loads of the vtbl >> pointer and replaced them with @classvtbl.TestVirtual. This would >> allow devirtualization within a function at least, but (I think) would >> do less to allow analysis to spot devirtualization opportunities >> across functions. (Although ArgPromotion could arrange to have the >> vtbl pointer passed in separately, and then inlining and/or partial >> specialization could be made to see that it's a pointer to a constant >> and thus add in the IndirectCallBonus) > > There are always going to be cases that can only be deduced with > language knowledge. We already do some frontend de-virtualization; > this may just be a missing case. Can you post your test? > > John. >I compiled the attached c++ code to bitcode using clang++ -O3, then ran it through opt -std-compile-opts. I got a reload of the vtbl pointer after the printf call. The compiler is allowed to cache the vtbl pointer as soon as the object is constructed or passed into a function, right? -------------- next part -------------- A non-text attachment was scrubbed... Name: TestVirtual.cpp Type: application/octet-stream Size: 366 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101011/46973ee1/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: TestVirtual.h Type: application/octet-stream Size: 97 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101011/46973ee1/attachment-0001.obj>
On Oct 11, 2010, at 10:43 AM, Kenneth Uildriks wrote:> On Mon, Oct 11, 2010 at 12:30 PM, John McCall <rjmccall at apple.com> wrote: >> On Oct 11, 2010, at 9:12 AM, Kenneth Uildriks wrote: >>> 3. The front-end, recognizing that scribbling on an instance's vtbl >>> pointer has undefined results, eliminated the loads of the vtbl >>> pointer and replaced them with @classvtbl.TestVirtual. This would >>> allow devirtualization within a function at least, but (I think) would >>> do less to allow analysis to spot devirtualization opportunities >>> across functions. (Although ArgPromotion could arrange to have the >>> vtbl pointer passed in separately, and then inlining and/or partial >>> specialization could be made to see that it's a pointer to a constant >>> and thus add in the IndirectCallBonus) >> >> There are always going to be cases that can only be deduced with >> language knowledge. We already do some frontend de-virtualization; >> this may just be a missing case. Can you post your test? > > I compiled the attached c++ code to bitcode using clang++ -O3, then > ran it through opt -std-compile-opts. I got a reload of the vtbl > pointer after the printf call. The compiler is allowed to cache the > vtbl pointer as soon as the object is constructed or passed into a > function, right?Yes. This is [basic.life]p7, which lists the conditions under which a pointer or reference to an object are considered to automatically forward to a new object that's created at that location (as opposed to formally still pointing at the destroyed object). One key constraint is that the objects have to be of identical type, i.e. their pointers would have to match exactly. So this is a fair optimization as long as we make sure that we keep pointers separate. For example, consider the following test case: struct A { virtual void foo(); }; struct B : A { virtual void foo(); }; void test() { A *a = new A(); a->foo(); // 1 a->foo(); // 2: can assume object type is still A a->~A(); A *b = new (a) B(); a->foo(); // 3: illegal use after end of lifetime b->foo(); // 4: okay, can assume object type is still B } Here we happen to be able to prove that 'a' and 'b' have identical values, but they nonetheless have different properties for this optimization because 'a' doesn't meet the criteria for having been forwarded, whereas 'b' points to the new object. Right now, we don't have any good machinery for doing this kind of language-specific optimization in the backend. Doing it in the front-end for your test case would require inter-statement analysis, which we've tried to avoid in clang. So I think the best bet for this optimization for now is to do it based on the knowledge that printf doesn't modify that particular argument. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101011/ee16db77/attachment.html>
On Mon, Oct 11, 2010 at 3:55 PM, John McCall <rjmccall at apple.com> wrote:> > On Oct 11, 2010, at 10:43 AM, Kenneth Uildriks wrote: > > On Mon, Oct 11, 2010 at 12:30 PM, John McCall <rjmccall at apple.com> wrote: > > On Oct 11, 2010, at 9:12 AM, Kenneth Uildriks wrote: > > 3. The front-end, recognizing that scribbling on an instance's vtbl > > pointer has undefined results, eliminated the loads of the vtbl > > pointer and replaced them with @classvtbl.TestVirtual. This would > > allow devirtualization within a function at least, but (I think) would > > do less to allow analysis to spot devirtualization opportunities > > across functions. (Although ArgPromotion could arrange to have the > > vtbl pointer passed in separately, and then inlining and/or partial > > specialization could be made to see that it's a pointer to a constant > > and thus add in the IndirectCallBonus) > > There are always going to be cases that can only be deduced with > > language knowledge. We already do some frontend de-virtualization; > > this may just be a missing case. Can you post your test? > > I compiled the attached c++ code to bitcode using clang++ -O3, then > ran it through opt -std-compile-opts. I got a reload of the vtbl > pointer after the printf call. The compiler is allowed to cache the > vtbl pointer as soon as the object is constructed or passed into a > function, right? > > Yes. This is [basic.life]p7, which lists the conditions under which a > pointer > or reference to an object are considered to automatically forward to a new > object that's created at that location (as opposed to formally still > pointing > at the destroyed object). One key constraint is that the objects have to be > of identical type, i.e. their pointers would have to match exactly. So this > is > a fair optimization as long as we make sure that we keep pointers separate. > For example, consider the following test case: > struct A { virtual void foo(); }; > struct B : A { virtual void foo(); }; > void test() { > A *a = new A(); > a->foo(); // 1 > a->foo(); // 2: can assume object type is still A > a->~A(); > A *b = new (a) B(); > a->foo(); // 3: illegal use after end of lifetime > b->foo(); // 4: okay, can assume object type is still B > } > Here we happen to be able to prove that 'a' and 'b' have identical values, > but they nonetheless have different properties for this optimization because > 'a' doesn't meet the criteria for having been forwarded, whereas 'b' points > to > the new object. > Right now, we don't have any good machinery for doing this kind of > language-specific optimization in the backend. Doing it in the front-end > for your test case would require inter-statement analysis, which we've tried > to avoid in clang. > So I think the best bet for this optimization for now is to do it based on > the > knowledge that printf doesn't modify that particular argument. > John.A better way for a front-end to declare that vtbl-ptr-hacking is not expected and not supported is for it to emit llvm.invariant.start and llvm.invariant.end calls for it. I tried putting that into my test code as well, and found that support for it could use some work. In particular, MemoryDependenceAnalysis seems to be the main consumer of it, and it only recognizes it if the entire invariant region is between a pointer store and a pointer load... it scans backward from the load, looking for llvm.invariant.end and starts skipping everything that precedes it until it finds a matching llvm.invariant.begin. It seems to me that the generally expected way for these markers to behave is to allow the region cover the entire range of instructions during which the invariance holds, including any loads of the memory, which MemoryDependenceAnalysis does *not* support at present. At any rate, having the front-end add those markers for the instance vtbl pointer shouldn't hurt anything, and should enable devirtualization in a variety of situations, including some cross-function scenarios. In your example above, the llvm.lifetime.start and llvm.lifetime.end would also be useful. llvm.lifetime.end would go after the destructor call, and llvm.lifetime.start would go right before the constructor calls. MemoryDependenceAnalysis also has code to look for those markers, but I haven't tested it. It looks like my best plan of attack given what parts of the system I understand is to improve the handling of those markers. It looks to me like those markers are exactly what clang needs to express vptr value lifetime & invariance, allowing better back-end optimization and avoiding inter-statement analysis.