thr3ads.net - llvm dev - [LLVMdev] dominance frontiers [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin

2012-Jan-07 04:27 UTC

[LLVMdev] dominance frontiers

On Fri, Jan 6, 2012 at 8:17 PM, Chris Lattner <clattner at apple.com>
wrote:>
> On Jan 6, 2012, at 5:08 PM, Chris Lattner wrote:
>
>>>>
>>>> It's very like SSA construction, but must make provision
>>>> testing anti dependences.  I had planned to use dominance
frontiers to
>>>> guide placement of phi nodes, as usual.
>>>
>>> Ok, in that case, please check out
include/llvm/Transforms/Utils/SSAUpdater.h,
>>> which is the preferred API (that doesn't use dom frontiers) to
build SSA.
>>
>> What approach does it use?  I look at the code, but don't really
see
>> an explanation.
>
> Sorry, my reading comprehension skills were low, I thought you were asking
about MemDep for some reason.  SSAUpdater uses simple local scans across the
CFG, excelling at local updates of values.
Note: GCC takes exactly the same approach as LLVM here, for exactly
the reason chris specifies.
In fact, until we started local SSA updating (which is now many years
ago, but ...), dominance frontier calculation for ssa updating was in
the top 10 profile functions for GCC compiles of large source files.
I had tried a number of the linear time phi placement algorithms
instead, but sadly, none were fast enough in practice (I believe
someone later published a paper comparing them all and finding the
same thing).

As a completely random aside, the algorithm given in the original
paper you cited is actually not quite the best way in practice.  It
performs more comparisons than strictly necessary. When I moved us to
the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf,
Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat
responsible for both algorithms :P). It is also a lot simpler to
explain, IMHO.

Both of these things happened around 2005 (local SSA updating in GCC
and replacement of the dominance frontier algorithm), so things may be
different now.

Cameron Zwarich

2012-Jan-07 05:14 UTC

head link

[LLVMdev] dominance frontiers

On Jan 6, 2012, at 8:27 PM, Daniel Berlin wrote:
> Note: GCC takes exactly the same approach as LLVM here, for exactly
> the reason chris specifies.
> In fact, until we started local SSA updating (which is now many years
> ago, but ...), dominance frontier calculation for ssa updating was in
> the top 10 profile functions for GCC compiles of large source files.
> I had tried a number of the linear time phi placement algorithms
> instead, but sadly, none were fast enough in practice (I believe
> someone later published a paper comparing them all and finding the
> same thing).
I wrote the current implementation of the up-front SSA construction in LLVM,
found in Utils/PromoteMemoryToRegister.cpp. It uses a modified version of the
Sreedhar-Gao algorithm from "A linear time algorithm for placing
phi-nodes". The most interesting modification is that it uses per-variable
liveness information to compute pruned SSA by pruning the entire IDF
computation, rather than merely pruning the DF -> IDF step like is usually
done. I was a bit surprised that it worked so well, given the reputation of the
"linear time" algorithms, but it performed the best out of everything
I attempted.
> As a completely random aside, the algorithm given in the original
> paper you cited is actually not quite the best way in practice.  It
> performs more comparisons than strictly necessary. When I moved us to
> the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf,
> Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat
> responsible for both algorithms :P). It is also a lot simpler to
> explain, IMHO.
There is a followup paper from other authors (including Tarjan himself) that
discusses some implementation tricks for the Lengauer-Tarjan algorithm:

http://www.cs.princeton.edu/~rwerneck/docs/GWTTA04.pdf

They cite personal communication from Keith Cooper that a later more careful
implementation of Lengauer-Tarjan led to different benchmark results than what
they published. I added those tricks and some others to LLVM's
implementation of Lengauer-Tarjan (the simple O(n log n) version) for a
significant performance improvement. I should do a comparison with a careful
implementation of the Cooper-Harvey-Kennedy algorithm some day.

Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120106/de67b49a/attachment.html>

Daniel Berlin

2012-Jan-07 06:40 UTC

head link

[LLVMdev] dominance frontiers

On Sat, Jan 7, 2012 at 12:14 AM, Cameron Zwarich <zwarich at apple.com>
wrote:> On Jan 6, 2012, at 8:27 PM, Daniel Berlin wrote:
>
> Note: GCC takes exactly the same approach as LLVM here, for exactly
> the reason chris specifies.
> In fact, until we started local SSA updating (which is now many years
> ago, but ...), dominance frontier calculation for ssa updating was in
> the top 10 profile functions for GCC compiles of large source files.
> I had tried a number of the linear time phi placement algorithms
> instead, but sadly, none were fast enough in practice (I believe
> someone later published a paper comparing them all and finding the
> same thing).
>
>
> I wrote the current implementation of the up-front SSA construction in
LLVM,
> found in Utils/PromoteMemoryToRegister.cpp. It uses a modified version of
> the Sreedhar-Gao algorithm from "A linear time algorithm for placing
> phi-nodes". The most interesting modification is that it uses
per-variable
> liveness information to compute pruned SSA by pruning the entire IDF
> computation, rather than merely pruning the DF -> IDF step like is
usually
> done. I was a bit surprised that it worked so well, given the reputation of
> the "linear time" algorithms, but it performed the best out of
everything I
> attempted.Interesting. Maybe some day i'll give it a try again.
I swear I had tried this one (I was always a fan of Sreedhar's work),
however, but it's entirely possible i messed up the implementation :)
>
> As a completely random aside, the algorithm given in the original
> paper you cited is actually not quite the best way in practice.  It
> performs more comparisons than strictly necessary. When I moved us to
> the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf,
> Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat
> responsible for both algorithms :P). It is also a lot simpler to
> explain, IMHO.
>
>
> There is a followup paper from other authors (including Tarjan himself)
that
> discusses some implementation tricks for the Lengauer-Tarjan algorithm:
Ah, i was referring to the *dominance frontier* computation, not the
dominator computation.
If you look, you'll see there is an algorithm in the paper that
computes dominance frontiers touching only the nodes that are
*actually in* the dominance frontier :)

The algorithm in the cytron/et al paper looks like this:

for each X in a bottom-up traversal of the dominator tree do
  DF(X) = empty
  for each Y in Successors(X)
    if (idom(Y) != X) then DF(X) += Y
  for each Z in DomChildren(X)
   for each Y in DF(Z)
     if (idom(Y) != X then DF(X) += Y

You can see that this does more comparisons than strictly necessary.
OTOH, the algorithm by Ferrante that Harvey gives is:

for each B in all basic blocks
  if the length of Predecessors(B) >= 2
    for each P in Predecessors(B)
     runner = P
     while runner != idom[B]
      DF(runner) += B
      runner = idom[runner]


You can see this does a lot less work by skipping a lot of useless nodes, etc.

It's also a lot simpler to explain :)
>
> http://www.cs.princeton.edu/~rwerneck/docs/GWTTA04.pdf
>
> They cite personal communication from Keith Cooper that a later more
careful
> implementation of Lengauer-Tarjan led to different benchmark results than
> what they published. I added those tricks and some others to LLVM's
> implementation of Lengauer-Tarjan (the simple O(n log n) version) for a
> significant performance improvement.
My only work on LLVM's dominators was when I successfully made a mess
of the dominator tree implementation in LLVM way back when by using a
splay-tree structure that stored the sequences, rather than BB's.   It
is technically faster to update, but in practice wildly complex to
maintain.
(See rev 25144)
:)
> when I should do a comparison with a careful
> implementation of the Cooper-Harvey-Kennedy algorithm some day.
We have always found the simple version of Lengauer-Tarjan + the
various tricks you are thinking of to beat out the
Cooper-Harvey-Kennedy algorithm.

We used to use the complex version, but as you also found, it was not
faster in practice than the O(n log n) + good engineering, and it was
a heck of a lot harder to maintain.
>
> Cameron

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jan 2012 - [LLVMdev] dominance frontiers

[LLVMdev] dominance frontiers

[LLVMdev] dominance frontiers

[LLVMdev] dominance frontiers

Possibly Parallel Threads