Hello all, Our compilers class has been using LLVM, and a partner and I decided to implement devirtualization of virtual C++ calls in LLVM as a class project. We quickly realized that existing debug metadata generated by Clang didn't give us enough info to (precisely) implement this, and as such have already begun modifying Clang to insert such metadata. However, for devirtualization we also need to reconstruct the class hierarchy, which we also seek to do through metadata. There appears to be sufficient metadata to do this, but we can't seem to figure out how to actually access this metadata and successfully reconstruct the class hierarchy. Can anyone help with this? We're also open to alternative approaches, but we'd like to stay in LLVM IR as much as possible. Thanks. -Vitor -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/e5dfa64d/attachment.html>
On Thu, Dec 8, 2011 at 9:56 AM, Vitor Luis Menezes <vitor at utexas.edu> wrote:> Hello all, > > Our compilers class has been using LLVM, and a partner and I decided to > implement devirtualization of virtual C++ calls in LLVM as a class project. > We quickly realized that existing debug metadata generated by Clang didn't > give us enough info to (precisely) implement this, and as such have already > begun modifying Clang to insert such metadata. However, for devirtualization > we also need to reconstruct the class hierarchy, which we also seek to do > through metadata. There appears to be sufficient metadata to do this, but we > can't seem to figure out how to actually access this metadata and > successfully reconstruct the class hierarchy. Can anyone help with this? > > We're also open to alternative approaches, but we'd like to stay in LLVM IR > as much as possible.Some of this is already done by LLVM/Clang - do you have particular cases that aren't being devirtualized that you want to focus on? For some brief background reading you might want to take a look at these bugs: http://llvm.org/bugs/show_bug.cgi?id=3100 http://llvm.org/bugs/show_bug.cgi?id=810 Which are things I (& others more experienced than myself) have posted some thoughts on. If you're interested in pursuing PR810 then I have some code that Nick Lewycky passed on to me involving his experiments in this domain. & I'm also going to CC Eric Christopher because he mentioned he'd had some thoughts on how to achieve this (the general problem described in 810 about how to pass assumptions/facts from the frontend to the backend) & I never got around to asking him about the details. This approach should stay even more in LLVM IR than your proposed solution of metadata or debug info, but it may have limitations/problems that your proposed approach does not - so I certainly wouldn't rule anything out just yet. - David
We've got the following test case: class A { public: int x; A(int x) : x(x) {} int hoo() {return 4;} virtual int foo() {return x;} virtual int goo() {return foo()+10;} virtual int operator+(A &a) { return x + a.x; } }; class B : public A { public: B(int x) : A(x) {} int hoo() {return 2;} virtual int foo() {return A::foo()*2;} }; int main() { A* a = new A(1); B* b = new B(2); int y = a->foo() + b->goo() + a->hoo() + b->hoo() + (*a + *b); delete a; delete b; return y; } Clang with -O4 -S -emit-llvm emits the following for main(): define i32 @main() { %1 = tail call noalias i8* @_Znwm(i64 16), !dbg !70 %2 = bitcast i8* %1 to %class.A*, !dbg !70 tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !68) tail call void @llvm.dbg.value(metadata !71, i64 0, metadata !69) tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !66) tail call void @llvm.dbg.value(metadata !71, i64 0, metadata !67) %3 = bitcast i8* %1 to i32 (...)*** store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %3, align 8 %4 = getelementptr inbounds i8* %1, i64 8 %5 = bitcast i8* %4 to i32* store i32 1, i32* %5, align 4, !tbaa !72 tail call void @llvm.dbg.value(metadata !{%class.A* %2}, i64 0, metadata !49), !dbg !70 %6 = tail call noalias i8* @_Znwm(i64 16), !dbg !75 tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !57) tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !58) tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !59) tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !60) tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !66) tail call void @llvm.dbg.value(metadata !76, i64 0, metadata !67) %7 = getelementptr inbounds i8* %6, i64 8 %8 = bitcast i8* %7 to i32* store i32 2, i32* %8, align 4, !tbaa !72 %9 = bitcast i8* %6 to i8*** store i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2), i8*** %9, align 8 tail call void @llvm.dbg.value(metadata !{null}, i64 0, metadata !52), !dbg !75 %10 = bitcast i8* %1 to i32 (%class.A*)***, !dbg !77 %11 = load i32 (%class.A*)*** %10, align 8, !dbg !77 %12 = load i32 (%class.A*)** %11, align 8, !dbg !77 %13 = tail call i32 %12(%class.A* %2), !dbg !77 %14 = bitcast i8* %6 to %class.A*, !dbg !77 %15 = bitcast i8* %6 to i32 (%class.A*)***, !dbg !77 %16 = load i32 (%class.A*)*** %15, align 8, !dbg !77 %17 = getelementptr inbounds i32 (%class.A*)** %16, i64 1, !dbg !77 %18 = load i32 (%class.A*)** %17, align 8, !dbg !77 %19 = tail call i32 %18(%class.A* %14), !dbg !77 %20 = bitcast i8* %1 to i32 (%class.A*, %class.A*)***, !dbg !77 %21 = load i32 (%class.A*, %class.A*)*** %20, align 8, !dbg !77 %22 = getelementptr inbounds i32 (%class.A*, %class.A*)** %21, i64 2, !dbg !77 %23 = load i32 (%class.A*, %class.A*)** %22, align 8, !dbg !77 %24 = tail call i32 %23(%class.A* %2, %class.A* %14), !dbg !77 %25 = add i32 %13, 6, !dbg !77 %26 = add i32 %25, %19, !dbg !77 %27 = add i32 %26, %24, !dbg !77 tail call void @llvm.dbg.value(metadata !{i32 %27}, i64 0, metadata !54), !dbg !77 %28 = icmp eq i8* %1, null, !dbg !78 br i1 %28, label %30, label %29, !dbg !78 ; <label>:29 ; preds = %0 tail call void @_ZdlPv(i8* %1) nounwind, !dbg !78 br label %30, !dbg !78 ; <label>:30 ; preds = %29, %0 %31 = icmp eq i8* %6, null, !dbg !78 br i1 %31, label %33, label %32, !dbg !78 ; <label>:32 ; preds = %30 tail call void @_ZdlPv(i8* %6) nounwind, !dbg !78 br label %33, !dbg !78 ; <label>:33 ; preds = %32, %30 ret i32 %27, !dbg !79 } It's a bit long-winded, but from looking at the code it's clear that no virtual calls are actually necessary, yet Clang and LLVM generated both of them. In particular, we seek to implement the sort of analysis for devirtualization by Sonajalg et al in http://www.cs.ut.ee/~varmo/seminar/sem09S/final/s6najalg.pdf but in C++, even if all we can get is a more conservative approximation in most cases. It's basically a lot of type analysis, and involves querying properties about types -lower- in the hierarchy than the declared or instantiated type, which doesn't seem to be an operation supported by Clang or debug metadata directly, so the only option (as we see it) is to build a custom representation of the class hierarchy from data we -do- have access to which allows it. Well, or generate even more metadata. It's unclear to me how much assert/assume features would help. I can see it as useful for simplifying the process of determining how much precision is needed (e.g. in a file-scoped function, we know that its arguments can't come from somewhere external and hence could actually determine what the "lowest" type arguments can be), but it's unclear how this per se helps with obtaining type information, since the -g flag seems to generate sufficient data, but with no clear way to access it. It could just be myopia and ignorance on my part, though. Thanks for your help, -Vitor On Thu, Dec 8, 2011 at 2:04 PM, David Blaikie <dblaikie at gmail.com> wrote:> On Thu, Dec 8, 2011 at 9:56 AM, Vitor Luis Menezes <vitor at utexas.edu> > wrote: > > Hello all, > > > > Our compilers class has been using LLVM, and a partner and I decided to > > implement devirtualization of virtual C++ calls in LLVM as a class > project. > > We quickly realized that existing debug metadata generated by Clang > didn't > > give us enough info to (precisely) implement this, and as such have > already > > begun modifying Clang to insert such metadata. However, for > devirtualization > > we also need to reconstruct the class hierarchy, which we also seek to do > > through metadata. There appears to be sufficient metadata to do this, > but we > > can't seem to figure out how to actually access this metadata and > > successfully reconstruct the class hierarchy. Can anyone help with this? > > > > We're also open to alternative approaches, but we'd like to stay in LLVM > IR > > as much as possible. > > Some of this is already done by LLVM/Clang - do you have particular > cases that aren't being devirtualized that you want to focus on? > > For some brief background reading you might want to take a look at these > bugs: > http://llvm.org/bugs/show_bug.cgi?id=3100 > http://llvm.org/bugs/show_bug.cgi?id=810 > > Which are things I (& others more experienced than myself) have posted > some thoughts on. If you're interested in pursuing PR810 then I have > some code that Nick Lewycky passed on to me involving his experiments > in this domain. & I'm also going to CC Eric Christopher because he > mentioned he'd had some thoughts on how to achieve this (the general > problem described in 810 about how to pass assumptions/facts from the > frontend to the backend) & I never got around to asking him about the > details. > > This approach should stay even more in LLVM IR than your proposed > solution of metadata or debug info, but it may have > limitations/problems that your proposed approach does not - so I > certainly wouldn't rule anything out just yet. > > - David >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/a3cfee2a/attachment.html>
Vitor Luis Menezes wrote:> Hello all, > > Our compilers class has been using LLVM, and a partner and I decided to > implement devirtualization of virtual C++ calls in LLVM as a class > project. We quickly realized that existing debug metadata generated by > Clang didn't give us enough info to (precisely) implement this, and as > such have already begun modifying Clang to insert such metadata. > However, for devirtualization we also need to reconstruct the class > hierarchy, which we also seek to do through metadata. There appears to > be sufficient metadata to do this, but we can't seem to figure out how > to actually access this metadata and successfully reconstruct the class > hierarchy. Can anyone help with this? > > We're also open to alternative approaches, but we'd like to stay in LLVM > IR as much as possible.Implement field-sensitive interprocedural sparse conditional constant propagation and it will devirtualize. The reason I don't like having an explicit devirtualization pass is that it requires a large amount of analysis (interprocedural, no less!) and is strictly single-purpose. The way that LLVM does devirtualization now is through a series of tons of tiny steps: * the very last stage in devirt is simply converting the load from a constant to the loaded constant: - but sometimes the frontend wouldn't normally emit the vtable at all (the ABI wouldn't require it to), so we wouldn't see the constant to propagate. We added a new linkage type to LLVM, "available_externally" designed to let us capture this extra data that's useful for the optimizer but without generating any additional symbols in the object file. * GVN folds loads against stores, using alias analysis. * alias analysis is helped by two interprocedural function attributes: - noalias return values. This indicates that the pointer returned does not alias any other pointer existing in the program. We will find functions that return these and propagate noalias up the call graph. - nocapture arguments. If the pointer doesn't escape the callee, then the function gets a 'nocapture' annotation on it. This is also propagated through the call graph. - standard library functions have their noalias and nocapture parts annotated. - together, this means that code like: FILE *f = fopen(...); fwrite(data, size, 1, f); fclose(f); has a pointer 'f' which can never possibly alias anything else in the program, and that's trivially provable. We also assume that "char *p = (char*)rand();" can never guess the address of 'f'. * the pass manager runs bottom-up over the callgraph, running the CallGraphSCCPasses over each strongly connected component, and all the function passes over each function. If we inline, we expose more code to GVN which may make it possible to fold a load against its store. - once we do so, we've changed the apparent call graph, while iterating over the call graph. We use this to refine the SCCs in our and re-run the optimization passes again, causing more refinement iteratively, allowing us to devirtualize more. - we can only inline functions defined inside a class body because of C++'s ODR rule. The actual linkage type is "linkonce" which would indicate that they are weak and can be discarded if never called, so we added "linkonce_odr" which indicates that the symbol is weak but that any replacement must be equivalent. What's great about using small steps like this is that each of these pieces is a useful optimization in its own right, even if we don't end up devirtualizing. A devirtualization pass doesn't "smell right" to me because it will have to do lots of analysis regardless of whether it uses it to optimize or throws it away at the end. In LLVM, the optimization passes try hard to be inexpensive when they don't transform the code at all. It's also language specific, while LLVM's existing approach works even with other languages or hand-rolled type hierarchies written with casts and no language feature support. The piece that's missing is devirtualization when we don't end up inlining. I don't know of a way to break that down into smaller pieces; llvm already has SCCP.cpp which does sparse conditional constant propagation and that includes an interprocedural variant, IPSCCP, but it's not field-sensitive. Propagating the stores from the vptr field to the relevant loads in the functions that aren't inlined would give us the last major missing piece. Noalias returns, nocapture, SCC refinement, linkonce_odr and available_externally were added with the goal of making devirtualization in LLVM happen, but as orthogonal independent optimizations. I think LLVM should continue with this design. If you want to implement a single substantial optimization pass, I would suggest FSIPSCCP as the largest thing you should write. Nick
On Dec 8, 2011, at 10:03 PM, Nick Lewycky wrote:> Noalias returns, nocapture, SCC refinement, linkonce_odr and > available_externally were added with the goal of making devirtualization > in LLVM happen, but as orthogonal independent optimizations. I think > LLVM should continue with this design. If you want to implement a single > substantial optimization pass, I would suggest FSIPSCCP as the largest > thing you should write.This is a lot of work that is going to be completely foiled by the presence of almost any opaque call at all. What's needed here is a language-independent way to exploit language-specific guarantees like C++ [basic.life]p7, i.e. that certain provenances of pointer guarantee that certain members have known, immutable values. There are analogous guarantees basically everywhere; for example, Java arrays have a length, ADTs in System F have a discriminator, etc. I would suggest an intrinsic like declare i8* @llvm.bless.known.memory(i8*, …) nounwind readnone where the … is a sequence of offset/value pairs. The load peephole is then quite obvious. An interesting extra case from [basic.life]p7 is that we can also state that 'const' fields are invariant after the "blessing point", although (unlike ivars) we can't necessarily assign a fixed value to them right then. John.