When I used -std-compile-opts -disable-inlining, my transform didn't happen. I think in your test, the inline of UseCallback into foo automatically made the function pointer into a constant, which turned it into a direct call that was then inlined. If UseCallback is too big to inline and uses the callback parameter inside a loop, this transform is potentially valuable, particularly if UseCallback is called multiple times with the same callback parameter. Interestingly, when I had foo call UseCallback multiple times with *only* callback1, it yanked the function pointer parameter out of UseCallback and turned the thing into a direct call. (I'm guessing dead argument elimination came into play here) But as soon as I added a call to UseCallback with callback2 to the mix, it went back to not making any indirect call elimination. On Fri, Jun 4, 2010 at 11:11 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Kenneth, > >> By that I mean an optimization pass (or a combination of them) that turns: > ... >> With that transform in place, lots of inlining becomes possible, and >> direct function calls replace indirect function calls if inlining >> isn't appropriate. If this transform is combined with argpromotion >> and scalarrepl, it can be used for devirtualization of C++ virtual >> function calls. >> >> There seems to be an awful lot of C++ code out there that uses >> templates to perform this same optimization in source code. > > yes, LLVM does this. For example, running your example through the LLVM > optimizers gives: > > define void @foo() nounwind readnone { > entry: > ret void > } > > As you can see, the indirect function calls were resolved into direct > function calls and inlined. > > I don't know which passes take care of this however. > > Ciao, > > Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
It should be relatively simple to write a pass that turns each call that has constant argument(s) into a call to specialized version of the callee. To devirtualize C++ calls it needs to be smarter, since the argument is not a constant, but a pointer to a struct that points to a constant. However, the trick here is 1) Knowing when to perform specialization. If the call was not inlined the function is probably big. Getting this wrong will generate *a lot* of code for very small (if not negative) speed gain. 2) Sharing of specializations from different call sites that have the same constants. Getting 1) right is crucial but hard. Easy cases are already taken by inline and dead argument elimination. If some good profiling information is available it can be used for speed/space trade off estimation (specialize calls from hot code). Eugene On Fri, Jun 4, 2010 at 6:29 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> When I used -std-compile-opts -disable-inlining, my transform didn't > happen. I think in your test, the inline of UseCallback into foo > automatically made the function pointer into a constant, which turned > it into a direct call that was then inlined. > > If UseCallback is too big to inline and uses the callback parameter > inside a loop, this transform is potentially valuable, particularly if > UseCallback is called multiple times with the same callback parameter. > > Interestingly, when I had foo call UseCallback multiple times with > *only* callback1, it yanked the function pointer parameter out of > UseCallback and turned the thing into a direct call. (I'm guessing > dead argument elimination came into play here) But as soon as I added > a call to UseCallback with callback2 to the mix, it went back to not > making any indirect call elimination. > > On Fri, Jun 4, 2010 at 11:11 AM, Duncan Sands <baldrick at free.fr> wrote: >> Hi Kenneth, >> >>> By that I mean an optimization pass (or a combination of them) that turns: >> ... >>> With that transform in place, lots of inlining becomes possible, and >>> direct function calls replace indirect function calls if inlining >>> isn't appropriate. If this transform is combined with argpromotion >>> and scalarrepl, it can be used for devirtualization of C++ virtual >>> function calls. >>> >>> There seems to be an awful lot of C++ code out there that uses >>> templates to perform this same optimization in source code. >> >> yes, LLVM does this. For example, running your example through the LLVM >> optimizers gives: >> >> define void @foo() nounwind readnone { >> entry: >> ret void >> } >> >> As you can see, the indirect function calls were resolved into direct >> function calls and inlined. >> >> I don't know which passes take care of this however. >> >> Ciao, >> >> Duncan. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Fri, Jun 4, 2010 at 1:35 PM, Eugene Toder <eltoder at gmail.com> wrote:> It should be relatively simple to write a pass that turns each call > that has constant argument(s) into a call to specialized version of > the callee. To devirtualize C++ calls it needs to be smarter, since > the argument is not a constant, but a pointer to a struct that points > to a constant. However, the trick here is > 1) Knowing when to perform specialization. If the call was not inlined > the function is probably big. Getting this wrong will generate *a lot* > of code for very small (if not negative) speed gain. > 2) Sharing of specializations from different call sites that have the > same constants. > Getting 1) right is crucial but hard. Easy cases are already taken by > inline and dead argument elimination. If some good profiling > information is available it can be used for speed/space trade off > estimation (specialize calls from hot code).As the number of callsites using the same constant grows, inlining gets more expensive while specializing does not - the cost of specializing only grows with the number of unique constants combos specialized. So cases where you'd want to specialize but not inline shouldn't be all that uncommon, and different cost calculations are needed to set the threshold. I didn't see the partial specialization pass in the docs, but I'll take a look at it now.
Hi,> 1) Knowing when to perform specialization. If the call was not inlined > the function is probably big. Getting this wrong will generate *a lot* > of code for very small (if not negative) speed gain.Could you elaborate why just having (lots of) more code in the final executable will incur a performance _penalty_? I was thinking of something similiar, but for type-specializations of functions of a dynamicly-typed language, so that the frontend creates more than one function for each function in the sourcecode.> 2) Sharing of specializations from different call sites that have the > same constants. > Getting 1) right is crucial but hard. Easy cases are already taken by > inline and dead argument elimination. If some good profiling > information is available it can be used for speed/space trade off > estimation (specialize calls from hot code). > > Eugene >Cornelius