thr3ads.net - llvm dev - [LLVMdev] Speculative phi elimination at the top of a loop? [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Pekka Nikander

2010-Jun-04 12:18 UTC

[LLVMdev] Speculative phi elimination at the top of a loop?

I am working on heavily optimising unusually static C++ code, and have
encountered a situation where I basically want an optimiser that would
speculatively unroll a loop to see if the first round of the loop could be
optimised further.  (I happen to know that it is possible.)  The previous
optimisations that produce the loop in the first place already do a magical job
(relying heavily on constant propagation), transforming cross-class tail
recursion into the loop that I am now addressing. Hence, there is probably
little than can be done in the previous optimisations.

So, has anyone worked on an optimisation where the optimiser unrolls a loop so a
way that allows it to speculatively try if the first round can be further
simplified?

In my case, the loop actually calls a C++ virtual method, but since the instance
object is a *constant* on the first round of the loop, it is possible to resolve
the method call into a static function, and then inline the static function,
which in this case essentially eliminates the first round of the loop.

Here is an example.  Let me have a simple loop bb.i, calling a C++ inner class
virtual method in SSA:

---------
define void @test() {
entry:
  %0 = alloca %struct.Foo, align 8
  br label %bb.i
  
bb.i:
  %1 = phi %struct.Foo* [ %11, %bb.i ], [ %0, %entry ]
  %t = phi %struct.Baseclass* [ %5, %bb.i ], [ getelementptr inbounds
(%struct.Bar* @_Const, i64 0, i32 0), %entry ]
  %2 = getelementptr inbounds %struct.Baseclass* %t, i64 0, i32 1
  %3 = load %struct.Innerclass** %2, align 8
  %4 = getelementptr inbounds %struct.Innerclass* %3, i64 0, i32 1
  %5 = load %struct.Baseclass** %4, align 8
  %6 = getelementptr inbounds %struct.Baseclass* %5, i64 0, i32 0 
  %7 = load i32 (...)*** %6, align 8
  %8 = getelementptr inbounds i32 (...)** %7, i64 2
  %9 = load i32 (...)** %8, align 8
  %10 = bitcast i32 (...)* %9 to %struct.Foo* (%struct.Baseclass*,
%struct.Foo*)*
  %11 = call %struct.Foo *%10(%struct.Baseclass *%5, %struct.Foo* %1)
  %12 = icmp eq %struct.Foo* %11, null
  br i1 %12, label %exit, label %bb.i

exit:
  ret void
}
--------

What I want essentially to do is to unroll this in a way that allows speculative
optimisation of the first round:

--------
define void @test() {
entry:
  %0 = alloca %struct.Foo, align 8
  br label %unrolled

unrolled:
  %ut = %struct.Baseclass* getelementptr inbounds %struct.Bar* @_Const, i64 0,
i32 0
  %u2 = getelementptr inbounds %struct.Baseclass* %ut, i64 0, i32 1
  %u3 = load %struct.Innerclass** %u2, align 8
  %u4 = getelementptr inbounds %struct.Innerclass* %u3, i64 0, i32 1
  %u5 = load %struct.Baseclass** %u4, align 8
  %u6 = getelementptr inbounds %struct.Baseclass* %u5, i64 0, i32 0 
  %u7 = load i32 (...)*** %u6, align 8
  %u8 = getelementptr inbounds i32 (...)** %u7, i64 2
  %u9 = load i32 (...)** %u8, align 8
  %u10 = bitcast i32 (...)* %u9 to %struct.Foo* (%struct.Baseclass*,
%struct.Foo*)*
  %u11 = call %struct.Foo *%u10(%struct.Baseclass *%u5, %struct.Foo* %0)
  %u12 = icmp eq %struct.Foo* %u11, null
  br i1 %u12, label %exit, label %bb.i
  
bb.i:
  %1 = phi %struct.Foo* [ %11, %bb.i ], [ %u11, %unrolled ]
  %t = phi %struct.Baseclass* [ %5, %bb.i ], [ %u5, %unrolled ]
  %2 = getelementptr inbounds %struct.Baseclass* %t, i64 0, i32 1
  %3 = load %struct.Innerclass** %2, align 8
  %4 = getelementptr inbounds %struct.Innerclass* %3, i64 0, i32 1
  %5 = load %struct.Baseclass** %4, align 8
  %6 = getelementptr inbounds %struct.Baseclass* %5, i64 0, i32 0 
  %7 = load i32 (...)*** %6, align 8
  %8 = getelementptr inbounds i32 (...)** %7, i64 2
  %9 = load i32 (...)** %8, align 8
  %10 = bitcast i32 (...)* %9 to %struct.Foo* (%struct.Baseclass*,
%struct.Foo*)*
  %11 = call %struct.Foo *%10(%struct.Baseclass *%5, %struct.Foo* %1)
  %12 = icmp eq %struct.Foo* %11, null
  br i1 %12, label %exit, label %bb.i

exit:
  ret void
}
--------

Now, since %ut is above a constant, it allows constant propagation to resolve
%u2, %u3, %u4 etc up to %u10, which also turns out to be a constant.  As a
result, the dynamic call to %u10 can be resolved to a static function call,
which then can be inlined.

Being a newbie with LLVM (but with quite some past experience with GCC from
1999-2000), I'd be interested in any ideas of how to approach this.  Would
the best way be to add an option to -loop-unroll, and hack away at
lib/Transforms/Utils/LoopUnroll.cpp?

--Pekka Nikander

Devang Patel

2010-Jun-04 17:36 UTC

head link

[LLVMdev] Speculative phi elimination at the top of a loop?

Hi,

On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander
<pekka.nikander at nomadiclab.com> wrote:>  Would the best way be to add an option to -loop-unroll, and hack away at
lib/Transforms/Utils/LoopUnroll.cpp?
Instead, the better alternative is to write another pass similar to
LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option
-loop-peel. The new pass could use llvm::UnrollLoop() utility
function. Feel free to adjust this utility function if required, but
the core utility function should be used by both passes.
-
Devang

Daniel Berlin

2010-Jun-04 17:43 UTC

head link

[LLVMdev] Speculative phi elimination at the top of a loop?

On Fri, Jun 4, 2010 at 8:18 AM, Pekka Nikander
<pekka.nikander at nomadiclab.com> wrote:> I am working on heavily optimising unusually static C++ code, and have
encountered a situation where I basically want an optimiser that would
speculatively unroll a loop to see if the first round of the loop could be
optimised further.  (I happen to know that it is possible.)  The previous
optimisations that produce the loop in the first place already do a magical job
(relying heavily on constant propagation), transforming cross-class tail
recursion into the loop that I am now addressing. Hence, there is probably
little than can be done in the previous optimisations.
>
> So, has anyone worked on an optimisation where the optimiser unrolls a loop
so a way that allows it to speculatively try if the first round can be further
simplified?
>
> In my case, the loop actually calls a C++ virtual method, but since the
instance object is a *constant* on the first round of the loop, it is possible
to resolve the method call into a static function, and then inline the static
function, which in this case essentially eliminates the first round of the loop.
The combination of GVN-PRE and SCCVN in GCC will do this, and we
actually block it from doing this in a lot of cases.

Basically, it can discover some value is constant on the first run
through a loop, and will create additional phi nodes to represent
those values.

In most cases, this is not that valuable (IE it would discover i = i +
1 is 2 on the first run through the loop, and replace it with a phi of
(2, <new calculation>) , and in fact, was significantly confusing to
SCEV analysis. So we blocked it. In your case, it would be valuable
however.

--Dan

Pekka Nikander

2010-Jun-04 19:36 UTC

head link

[LLVMdev] Speculative phi elimination at the top of a loop?

Hi Devang,

On Fri, Jun 4, 2010 at 5:18 AM, Pekka Nikander <pekka.nikander at
nomadiclab.com> wrote:>>  Would the best way be to add an option to -loop-unroll, and hack away
at lib/Transforms/Utils/LoopUnroll.cpp?
On 2010-06 -04, at 20:36 , Devang Patel wrote:> Instead, the better alternative is to write another pass similar to
> LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option
> -loop-peel. The new pass could use llvm::UnrollLoop() utility
> function. Feel free to adjust this utility function if required, but
> the core utility function should be used by both passes.
Thanks, we'll do that.  

BTW, I wanted to know why I didn't know the term, and found this: 
Apparently loop peeling was included into GCC 3.4 around 2003-2004.  The
previous time I did optimisation was in 1999-2000, and it was with GCC 2.95.... 
The GCC folks seem to cite the Cohn and Lowney paper [1] from 2000.  Hwu et al
[2] is from 1993, but it doesn't use the term.  No wonder I didn't know
this is called peeling. :-)

Hence, it looks like that my past experience is ancient by now.  I guess I need
to dig out more recent papers referring to those, as well as seeing what Kennedy
and Allen write in their book (which is cited at the Wikipedia page).  Any other
good pieces of work to update my knowledge?

[1] Robert Cohn and P. Geoffrey Lowney, "Design and Analysis of
Profile-Based Optimization in Compaq's Compilation Tools for Alpha",
Journal of Instruction Level Parallelism, 2000.

[2] W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A.
Bringmann, R. O. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and
D. M. Lavery, “The Superblock: An Effective Technique for VLIW and Superscalar
Compilation ,” The Journal of Supercomputing, Kluwer Aca- demic Publishers,
1993, pp. 229-248.

--Pekka

Nick Lewycky

2010-Jun-07 20:21 UTC

head link

[LLVMdev] Speculative phi elimination at the top of a loop?

On 4 June 2010 10:43, Daniel Berlin <dberlin at dberlin.org> wrote:
> On Fri, Jun 4, 2010 at 8:18 AM, Pekka Nikander
> <pekka.nikander at nomadiclab.com> wrote:
> > I am working on heavily optimising unusually static C++ code, and have
> encountered a situation where I basically want an optimiser that would
> speculatively unroll a loop to see if the first round of the loop could be
> optimised further.  (I happen to know that it is possible.)  The previous
> optimisations that produce the loop in the first place already do a magical
> job (relying heavily on constant propagation), transforming cross-class
tail
> recursion into the loop that I am now addressing. Hence, there is probably
> little than can be done in the previous optimisations.
> >
> > So, has anyone worked on an optimisation where the optimiser unrolls a
> loop so a way that allows it to speculatively try if the first round can be
> further simplified?
> >
> > In my case, the loop actually calls a C++ virtual method, but since
the
> instance object is a *constant* on the first round of the loop, it is
> possible to resolve the method call into a static function, and then inline
> the static function, which in this case essentially eliminates the first
> round of the loop.
>
> The combination of GVN-PRE and SCCVN in GCC will do this, and we
> actually block it from doing this in a lot of cases.
>
> Basically, it can discover some value is constant on the first run
> through a loop, and will create additional phi nodes to represent
> those values.
>
> In most cases, this is not that valuable (IE it would discover i = i +
> 1 is 2 on the first run through the loop, and replace it with a phi of
> (2, <new calculation>) , and in fact, was significantly confusing to
> SCEV analysis. So we blocked it. In your case, it would be valuable
> however.
>
I would suggest trying to do this on any pointer-typed const-expr. Thoughts?

Nick

>
> --Dan
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100607/d1ab8be1/attachment.html>

Pekka Nikander

2010-Sep-21 10:23 UTC

head link

[LLVMdev] Loop unroll pass (was: Re: Speculative phi elimination at the top of a loop?)

>>  Would the best way be to add an option to -loop-unroll, and hack away
at lib/Transforms/Utils/LoopUnroll.cpp?
> 
> Instead, the better alternative is to write another pass similar to
> LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option
> -loop-peel. The new pass could use llvm::UnrollLoop() utility
> function. Feel free to adjust this utility function if required, but
> the core utility function should be used by both passes
It took some time (mainly because of the rest of the project), but I have now an
early version of LoopPeelPass.cpp implemented.  It uses the llvm::UnrollPass()
utility function; I added one argument, "bool Peeling", to the
UnrollPass() function, with the default value of "false".

There still some problems related to the block merging in the end of
llvm::UnrollPass(), but otherwise it seems to work.

I'm afraid it will take some more time before I have a good and clean patch
ready, but in the mean time if anyone would like to review the changed, I'd
be more than happy to send a diff off-list.

--Pekka

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jun 2010 - [LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Speculative phi elimination at the top of a loop?

[LLVMdev] Loop unroll pass (was: Re: Speculative phi elimination at the top of a loop?)

Reasonably Related Threads