On Fri, Jan 6, 2012 at 8:17 PM, Chris Lattner <clattner at apple.com> wrote:> > On Jan 6, 2012, at 5:08 PM, Chris Lattner wrote: > >>>> >>>> It's very like SSA construction, but must make provision >>>> testing anti dependences. I had planned to use dominance frontiers to >>>> guide placement of phi nodes, as usual. >>> >>> Ok, in that case, please check out include/llvm/Transforms/Utils/SSAUpdater.h, >>> which is the preferred API (that doesn't use dom frontiers) to build SSA. >> >> What approach does it use? I look at the code, but don't really see >> an explanation. > > Sorry, my reading comprehension skills were low, I thought you were asking about MemDep for some reason. SSAUpdater uses simple local scans across the CFG, excelling at local updates of values.Note: GCC takes exactly the same approach as LLVM here, for exactly the reason chris specifies. In fact, until we started local SSA updating (which is now many years ago, but ...), dominance frontier calculation for ssa updating was in the top 10 profile functions for GCC compiles of large source files. I had tried a number of the linear time phi placement algorithms instead, but sadly, none were fast enough in practice (I believe someone later published a paper comparing them all and finding the same thing). As a completely random aside, the algorithm given in the original paper you cited is actually not quite the best way in practice. It performs more comparisons than strictly necessary. When I moved us to the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf, Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat responsible for both algorithms :P). It is also a lot simpler to explain, IMHO. Both of these things happened around 2005 (local SSA updating in GCC and replacement of the dominance frontier algorithm), so things may be different now.
On Jan 6, 2012, at 8:27 PM, Daniel Berlin wrote:> Note: GCC takes exactly the same approach as LLVM here, for exactly > the reason chris specifies. > In fact, until we started local SSA updating (which is now many years > ago, but ...), dominance frontier calculation for ssa updating was in > the top 10 profile functions for GCC compiles of large source files. > I had tried a number of the linear time phi placement algorithms > instead, but sadly, none were fast enough in practice (I believe > someone later published a paper comparing them all and finding the > same thing).I wrote the current implementation of the up-front SSA construction in LLVM, found in Utils/PromoteMemoryToRegister.cpp. It uses a modified version of the Sreedhar-Gao algorithm from "A linear time algorithm for placing phi-nodes". The most interesting modification is that it uses per-variable liveness information to compute pruned SSA by pruning the entire IDF computation, rather than merely pruning the DF -> IDF step like is usually done. I was a bit surprised that it worked so well, given the reputation of the "linear time" algorithms, but it performed the best out of everything I attempted.> As a completely random aside, the algorithm given in the original > paper you cited is actually not quite the best way in practice. It > performs more comparisons than strictly necessary. When I moved us to > the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf, > Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat > responsible for both algorithms :P). It is also a lot simpler to > explain, IMHO.There is a followup paper from other authors (including Tarjan himself) that discusses some implementation tricks for the Lengauer-Tarjan algorithm: http://www.cs.princeton.edu/~rwerneck/docs/GWTTA04.pdf They cite personal communication from Keith Cooper that a later more careful implementation of Lengauer-Tarjan led to different benchmark results than what they published. I added those tricks and some others to LLVM's implementation of Lengauer-Tarjan (the simple O(n log n) version) for a significant performance improvement. I should do a comparison with a careful implementation of the Cooper-Harvey-Kennedy algorithm some day. Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120106/de67b49a/attachment.html>
On Sat, Jan 7, 2012 at 12:14 AM, Cameron Zwarich <zwarich at apple.com> wrote:> On Jan 6, 2012, at 8:27 PM, Daniel Berlin wrote: > > Note: GCC takes exactly the same approach as LLVM here, for exactly > the reason chris specifies. > In fact, until we started local SSA updating (which is now many years > ago, but ...), dominance frontier calculation for ssa updating was in > the top 10 profile functions for GCC compiles of large source files. > I had tried a number of the linear time phi placement algorithms > instead, but sadly, none were fast enough in practice (I believe > someone later published a paper comparing them all and finding the > same thing). > > > I wrote the current implementation of the up-front SSA construction in LLVM, > found in Utils/PromoteMemoryToRegister.cpp. It uses a modified version of > the Sreedhar-Gao algorithm from "A linear time algorithm for placing > phi-nodes". The most interesting modification is that it uses per-variable > liveness information to compute pruned SSA by pruning the entire IDF > computation, rather than merely pruning the DF -> IDF step like is usually > done. I was a bit surprised that it worked so well, given the reputation of > the "linear time" algorithms, but it performed the best out of everything I > attempted.Interesting. Maybe some day i'll give it a try again. I swear I had tried this one (I was always a fan of Sreedhar's work), however, but it's entirely possible i messed up the implementation :)> > As a completely random aside, the algorithm given in the original > paper you cited is actually not quite the best way in practice. It > performs more comparisons than strictly necessary. When I moved us to > the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf, > Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat > responsible for both algorithms :P). It is also a lot simpler to > explain, IMHO. > > > There is a followup paper from other authors (including Tarjan himself) that > discusses some implementation tricks for the Lengauer-Tarjan algorithm:Ah, i was referring to the *dominance frontier* computation, not the dominator computation. If you look, you'll see there is an algorithm in the paper that computes dominance frontiers touching only the nodes that are *actually in* the dominance frontier :) The algorithm in the cytron/et al paper looks like this: for each X in a bottom-up traversal of the dominator tree do DF(X) = empty for each Y in Successors(X) if (idom(Y) != X) then DF(X) += Y for each Z in DomChildren(X) for each Y in DF(Z) if (idom(Y) != X then DF(X) += Y You can see that this does more comparisons than strictly necessary. OTOH, the algorithm by Ferrante that Harvey gives is: for each B in all basic blocks if the length of Predecessors(B) >= 2 for each P in Predecessors(B) runner = P while runner != idom[B] DF(runner) += B runner = idom[runner] You can see this does a lot less work by skipping a lot of useless nodes, etc. It's also a lot simpler to explain :)> > http://www.cs.princeton.edu/~rwerneck/docs/GWTTA04.pdf > > They cite personal communication from Keith Cooper that a later more careful > implementation of Lengauer-Tarjan led to different benchmark results than > what they published. I added those tricks and some others to LLVM's > implementation of Lengauer-Tarjan (the simple O(n log n) version) for a > significant performance improvement.My only work on LLVM's dominators was when I successfully made a mess of the dominator tree implementation in LLVM way back when by using a splay-tree structure that stored the sequences, rather than BB's. It is technically faster to update, but in practice wildly complex to maintain. (See rev 25144) :)> when I should do a comparison with a careful > implementation of the Cooper-Harvey-Kennedy algorithm some day.We have always found the simple version of Lengauer-Tarjan + the various tricks you are thinking of to beat out the Cooper-Harvey-Kennedy algorithm. We used to use the complex version, but as you also found, it was not faster in practice than the O(n log n) + good engineering, and it was a heck of a lot harder to maintain.> > Cameron