Pekka Nikander
2010-Jun-04 12:18 UTC
[LLVMdev] Speculative phi elimination at the top of a loop?
I am working on heavily optimising unusually static C++ code, and have encountered a situation where I basically want an optimiser that would speculatively unroll a loop to see if the first round of the loop could be optimised further. (I happen to know that it is possible.) The previous optimisations that produce the loop in the first place already do a magical job (relying heavily on constant propagation), transforming cross-class tail recursion into the loop that I am now addressing. Hence, there is probably little than can be done in the previous optimisations. So, has anyone worked on an optimisation where the optimiser unrolls a loop so a way that allows it to speculatively try if the first round can be further simplified? In my case, the loop actually calls a C++ virtual method, but since the instance object is a *constant* on the first round of the loop, it is possible to resolve the method call into a static function, and then inline the static function, which in this case essentially eliminates the first round of the loop. Here is an example. Let me have a simple loop bb.i, calling a C++ inner class virtual method in SSA: --------- define void @test() { entry: %0 = alloca %struct.Foo, align 8 br label %bb.i bb.i: %1 = phi %struct.Foo* [ %11, %bb.i ], [ %0, %entry ] %t = phi %struct.Baseclass* [ %5, %bb.i ], [ getelementptr inbounds (%struct.Bar* @_Const, i64 0, i32 0), %entry ] %2 = getelementptr inbounds %struct.Baseclass* %t, i64 0, i32 1 %3 = load %struct.Innerclass** %2, align 8 %4 = getelementptr inbounds %struct.Innerclass* %3, i64 0, i32 1 %5 = load %struct.Baseclass** %4, align 8 %6 = getelementptr inbounds %struct.Baseclass* %5, i64 0, i32 0 %7 = load i32 (...)*** %6, align 8 %8 = getelementptr inbounds i32 (...)** %7, i64 2 %9 = load i32 (...)** %8, align 8 %10 = bitcast i32 (...)* %9 to %struct.Foo* (%struct.Baseclass*, %struct.Foo*)* %11 = call %struct.Foo *%10(%struct.Baseclass *%5, %struct.Foo* %1) %12 = icmp eq %struct.Foo* %11, null br i1 %12, label %exit, label %bb.i exit: ret void } -------- What I want essentially to do is to unroll this in a way that allows speculative optimisation of the first round: -------- define void @test() { entry: %0 = alloca %struct.Foo, align 8 br label %unrolled unrolled: %ut = %struct.Baseclass* getelementptr inbounds %struct.Bar* @_Const, i64 0, i32 0 %u2 = getelementptr inbounds %struct.Baseclass* %ut, i64 0, i32 1 %u3 = load %struct.Innerclass** %u2, align 8 %u4 = getelementptr inbounds %struct.Innerclass* %u3, i64 0, i32 1 %u5 = load %struct.Baseclass** %u4, align 8 %u6 = getelementptr inbounds %struct.Baseclass* %u5, i64 0, i32 0 %u7 = load i32 (...)*** %u6, align 8 %u8 = getelementptr inbounds i32 (...)** %u7, i64 2 %u9 = load i32 (...)** %u8, align 8 %u10 = bitcast i32 (...)* %u9 to %struct.Foo* (%struct.Baseclass*, %struct.Foo*)* %u11 = call %struct.Foo *%u10(%struct.Baseclass *%u5, %struct.Foo* %0) %u12 = icmp eq %struct.Foo* %u11, null br i1 %u12, label %exit, label %bb.i bb.i: %1 = phi %struct.Foo* [ %11, %bb.i ], [ %u11, %unrolled ] %t = phi %struct.Baseclass* [ %5, %bb.i ], [ %u5, %unrolled ] %2 = getelementptr inbounds %struct.Baseclass* %t, i64 0, i32 1 %3 = load %struct.Innerclass** %2, align 8 %4 = getelementptr inbounds %struct.Innerclass* %3, i64 0, i32 1 %5 = load %struct.Baseclass** %4, align 8 %6 = getelementptr inbounds %struct.Baseclass* %5, i64 0, i32 0 %7 = load i32 (...)*** %6, align 8 %8 = getelementptr inbounds i32 (...)** %7, i64 2 %9 = load i32 (...)** %8, align 8 %10 = bitcast i32 (...)* %9 to %struct.Foo* (%struct.Baseclass*, %struct.Foo*)* %11 = call %struct.Foo *%10(%struct.Baseclass *%5, %struct.Foo* %1) %12 = icmp eq %struct.Foo* %11, null br i1 %12, label %exit, label %bb.i exit: ret void } -------- Now, since %ut is above a constant, it allows constant propagation to resolve %u2, %u3, %u4 etc up to %u10, which also turns out to be a constant. As a result, the dynamic call to %u10 can be resolved to a static function call, which then can be inlined. Being a newbie with LLVM (but with quite some past experience with GCC from 1999-2000), I'd be interested in any ideas of how to approach this. Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp? --Pekka Nikander
Devang Patel
2010-Jun-04 17:36 UTC
[LLVMdev] Speculative phi elimination at the top of a loop?
Hi, On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander <pekka.nikander at nomadiclab.com> wrote:> Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp?Instead, the better alternative is to write another pass similar to LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option -loop-peel. The new pass could use llvm::UnrollLoop() utility function. Feel free to adjust this utility function if required, but the core utility function should be used by both passes. - Devang
Daniel Berlin
2010-Jun-04 17:43 UTC
[LLVMdev] Speculative phi elimination at the top of a loop?
On Fri, Jun 4, 2010 at 8:18 AM, Pekka Nikander <pekka.nikander at nomadiclab.com> wrote:> I am working on heavily optimising unusually static C++ code, and have encountered a situation where I basically want an optimiser that would speculatively unroll a loop to see if the first round of the loop could be optimised further. (I happen to know that it is possible.) The previous optimisations that produce the loop in the first place already do a magical job (relying heavily on constant propagation), transforming cross-class tail recursion into the loop that I am now addressing. Hence, there is probably little than can be done in the previous optimisations. > > So, has anyone worked on an optimisation where the optimiser unrolls a loop so a way that allows it to speculatively try if the first round can be further simplified? > > In my case, the loop actually calls a C++ virtual method, but since the instance object is a *constant* on the first round of the loop, it is possible to resolve the method call into a static function, and then inline the static function, which in this case essentially eliminates the first round of the loop.The combination of GVN-PRE and SCCVN in GCC will do this, and we actually block it from doing this in a lot of cases. Basically, it can discover some value is constant on the first run through a loop, and will create additional phi nodes to represent those values. In most cases, this is not that valuable (IE it would discover i = i + 1 is 2 on the first run through the loop, and replace it with a phi of (2, <new calculation>) , and in fact, was significantly confusing to SCEV analysis. So we blocked it. In your case, it would be valuable however. --Dan
Pekka Nikander
2010-Jun-04 19:36 UTC
[LLVMdev] Speculative phi elimination at the top of a loop?
Hi Devang, On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander <pekka.nikander at nomadiclab.com> wrote:>> Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp?On 2010-06 -04, at 20:36 , Devang Patel wrote:> Instead, the better alternative is to write another pass similar to > LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option > -loop-peel. The new pass could use llvm::UnrollLoop() utility > function. Feel free to adjust this utility function if required, but > the core utility function should be used by both passes.Thanks, we'll do that. BTW, I wanted to know why I didn't know the term, and found this: Apparently loop peeling was included into GCC 3.4 around 2003-2004. The previous time I did optimisation was in 1999-2000, and it was with GCC 2.95.... The GCC folks seem to cite the Cohn and Lowney paper [1] from 2000. Hwu et al [2] is from 1993, but it doesn't use the term. No wonder I didn't know this is called peeling. :-) Hence, it looks like that my past experience is ancient by now. I guess I need to dig out more recent papers referring to those, as well as seeing what Kennedy and Allen write in their book (which is cited at the Wikipedia page). Any other good pieces of work to update my knowledge? [1] Robert Cohn and P. Geoffrey Lowney, "Design and Analysis of Profile-Based Optimization in Compaq's Compilation Tools for Alpha", Journal of Instruction Level Parallelism, 2000. [2] W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. O. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, “The Superblock: An Effective Technique for VLIW and Superscalar Compilation ,” The Journal of Supercomputing, Kluwer Aca- demic Publishers, 1993, pp. 229-248. --Pekka
Nick Lewycky
2010-Jun-07 20:21 UTC
[LLVMdev] Speculative phi elimination at the top of a loop?
On 4 June 2010 10:43, Daniel Berlin <dberlin at dberlin.org> wrote:> On Fri, Jun 4, 2010 at 8:18 AM, Pekka Nikander > <pekka.nikander at nomadiclab.com> wrote: > > I am working on heavily optimising unusually static C++ code, and have > encountered a situation where I basically want an optimiser that would > speculatively unroll a loop to see if the first round of the loop could be > optimised further. (I happen to know that it is possible.) The previous > optimisations that produce the loop in the first place already do a magical > job (relying heavily on constant propagation), transforming cross-class tail > recursion into the loop that I am now addressing. Hence, there is probably > little than can be done in the previous optimisations. > > > > So, has anyone worked on an optimisation where the optimiser unrolls a > loop so a way that allows it to speculatively try if the first round can be > further simplified? > > > > In my case, the loop actually calls a C++ virtual method, but since the > instance object is a *constant* on the first round of the loop, it is > possible to resolve the method call into a static function, and then inline > the static function, which in this case essentially eliminates the first > round of the loop. > > The combination of GVN-PRE and SCCVN in GCC will do this, and we > actually block it from doing this in a lot of cases. > > Basically, it can discover some value is constant on the first run > through a loop, and will create additional phi nodes to represent > those values. > > In most cases, this is not that valuable (IE it would discover i = i + > 1 is 2 on the first run through the loop, and replace it with a phi of > (2, <new calculation>) , and in fact, was significantly confusing to > SCEV analysis. So we blocked it. In your case, it would be valuable > however. >I would suggest trying to do this on any pointer-typed const-expr. Thoughts? Nick> > --Dan > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100607/d1ab8be1/attachment.html>
Pekka Nikander
2010-Sep-21 10:23 UTC
[LLVMdev] Loop unroll pass (was: Re: Speculative phi elimination at the top of a loop?)
>> Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp? > > Instead, the better alternative is to write another pass similar to > LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option > -loop-peel. The new pass could use llvm::UnrollLoop() utility > function. Feel free to adjust this utility function if required, but > the core utility function should be used by both passesIt took some time (mainly because of the rest of the project), but I have now an early version of LoopPeelPass.cpp implemented. It uses the llvm::UnrollPass() utility function; I added one argument, "bool Peeling", to the UnrollPass() function, with the default value of "false". There still some problems related to the block merging in the end of llvm::UnrollPass(), but otherwise it seems to work. I'm afraid it will take some more time before I have a good and clean patch ready, but in the mean time if anyone would like to review the changed, I'd be more than happy to send a diff off-list. --Pekka